Sep 23–26, 2019

Schedule List View Grid View

Monday, 09/23/2019

9:00am

SOLD OUT: Big data for managers

9:00am–5:00pm Monday, September 23, 2019

Training

Strata Business Summit

Michael Li (The Data Incubator), Gonzalo Diaz (The Data Incubator)

Michael Li and Gonzalo Diaz provide a nontechnical overview of AI and data science. Learn common techniques, how to apply them in your organization, and common pitfalls to avoid. You’ll pick up the language and develop a framework to be able to effectively engage with technical experts and use their input and analysis for your business’s strategic priorities and decision making. Read more.

Recommendation systems using deep learning

9:00am–5:00pm Monday, September 23, 2019

Training

Data Science, Machine Learning, & AI

Secondary topics: Deep Learning, Media and Advertising, Retail and e-commerce

Bargava Subramanian (Binaize), Amit Kapoor (narrativeVIZ)

Recommendation systems play a significant role—for users, a new world of options; for companies, it drives engagement and satisfaction. Amit Kapoor and Bargava Subramanian walk you through the different paradigms of recommendation systems and introduce you to deep learning-based approaches. You'll gain the practical hands-on knowledge to build, select, deploy, and maintain a recommendation system. Read more.

Serverless machine learning with TensorFlow and BigQuery (sponsored by Google Cloud)

9:00am–5:00pm Monday, September 23, 2019

Training

Sponsored

Jeff Davis (Google Cloud)

Jeff Davis provides a hands-on introduction to designing and building machine learning models on structured data on Google Cloud Platform. Through a combination of presentations, demos, and hands-on labs, you'll learn machine learning (ML) concepts and how to implement them using both BigQuery Machine Learning and TensorFlow and Keras. Read more.

Hands-on data science with Python

9:00am–5:00pm Monday, September 23, 2019

Training

Data Science, Machine Learning, & AI

Secondary topics: Deep dive into specific tools, platforms, or frameworks

Michael Cullan (Pragmatic Institute)

Michael Cullan walks you through developing a machine learning pipeline from prototyping to production. You'll learn about data cleaning, feature engineering, model building and evaluation, and deployment and then extend these models into two applications from real-world datasets. All work will be done in Python. Read more.

SOLD OUT: Building a serverless big data application on AWS

9:00am–5:00pm Monday, September 23, 2019

Training

Data Engineering and Architecture

Secondary topics: Cloud Platforms and SaaS, Data Integration and Data Processing, Data, Analytics, and AI Architecture, Deep dive into specific tools, platforms, or frameworks

Jorge Lopez (Amazon Web Services), Radhika Ravirala (Amazon Web Services), Nikki Rouda (Amazon Web Services), Jesse Gebhardt (Amazon Web Services), Rajeev Chakrabarti (Amazon Web Services)

Serverless technologies let you build and scale applications and services rapidly without the need to provision or manage servers. Join the AWS team to learn how to incorporate serverless concepts into your big data architectures. You'll explore design patterns to ingest, store, and analyze your data as you build a big data application using AWS technologies such as S3, Athena, Kinesis, and more. Read more.

Expand your data science and machine learning skills with Python, R, SQL, Spark, and TensorFlow

9:00am–5:00pm Monday, September 23, 2019

Training

Data Science, Machine Learning, & AI

Ian Cook (Cloudera)

Advancing your career in data science requires learning new languages and frameworks—but you face an overwhelming array of choices, each with different syntaxes, conventions, and terminology. Ian Cook simplifies the learning process by outlining the abstractions common to these systems. You'll go hands-on exercises to overcome obstacles to getting started using new tools. Read more.

Professional Kafka development

9:00am–5:00pm Monday, September 23, 2019

Training

Data Engineering and Architecture

Secondary topics: Data Integration and Data Processing, Deep dive into specific tools, platforms, or frameworks

Jesse Anderson (Big Data Institute)

Jesse Anderson offers you an in-depth look at Apache Kafka. You'll learn how Kafka works and how to create real-time systems with it, as well as how to create consumers and publishers. You'll take a look Jesse then walks you through Kafka’s ecosystem, demonstrating how to use tools like Kafka Streams, Kafka Connect, and KSQL. Read more.

Machine learning from scratch in TensorFlow

9:00am–5:00pm Monday, September 23, 2019

Training

Data Science, Machine Learning, & AI

Secondary topics: Deep dive into specific tools, platforms, or frameworks, Deep Learning

Dylan Bargteil (The Data Incubator)

The TensorFlow library provides for the use of computational graphs with automatic parallelization across resources. This architecture is ideal for implementing neural networks. Dylan Bargteil explores TensorFlow's capabilities in Python, demonstrating how to build machine learning algorithms piece by piece and how to use TensorFlow's Keras API with several hands-on applications. Read more.

10:30am

10:30am–11:00am Monday, September 23, 2019

Morning break (30m)

12:30pm

12:30pm–1:30pm Monday, September 23, 2019

Lunch (1h)

3:00pm

3:00pm–3:30pm Monday, September 23, 2019

Afternoon break (30m)

7:00pm

Strata Dine-Around

7:00pm–9:00pm Monday, September 23, 2019

Event

Get to know your fellow attendees over dinner. We've made reservations for you at some of the most sought-after restaurants in town. This is a great chance to make new connections and sample some of the great cuisine New York has to offer. Read more.

Tuesday, 09/24/2019

9:00am

Machine learning for the enterprise (sponsored by IBM)

9:00am–5:00pm Tuesday, September 24, 2019

Training

Sponsored

Matt Kirk (Your Chief Scientist), Miguel Maldonado (IBM)

Note: This free workshop, courtesy of IBM, is open to the first 50 registrants. You'll take a fascinating deep dive into the power and applications of machine learning in the enterprise. Read more.

Data Case Studies

9:00am–5:00pm Tuesday, September 24, 2019

David Boyle (Audience Strategies), Richard Evans (Statistics Canada), Rosaria Silipo (KNIME), Leah Xu (Spotify), Arup Nanda (Capital One), Victoriya Kalmanovich (Navy), Tusharadri Mukherjee (Lenovo), David Boyle (Audience Strategies), Richard Evans (Statistics Canada), Leah Xu (Spotify), Victoriya Kalmanovich (Navy), Moise Convolbo (Rakuten), Martin Mendez-Costabel (Bayer Crop Science), gloria macia (F. Hoffmann-La Roche AG), Gwen Campbell (Revibe Technologies), Moise Convolbo (Rakuten), Muhammed Idris (Capria VC | TeraCrunch)

From banking to biotech, retail to government, every business sector is changing in the face of abundant data. Get better at defining business problems and applying data solutions at Strata. Read more.

Findata Day

9:00am–5:00pm Tuesday, September 24, 2019

Alistair Croll (Solve For Interesting), Jennifer Yang (Wells Fargo ECS), Brian Lynch (TD Bank Group), Dan Barker (RSA Security), Rochelle March (Trucost), Catherine Gu (Stanford University), Karan Jaswal (Cinchy), Moto Tohda (Tokyo Century (USA)), Viridiana Lourdes (Ayasdi), Peter Swartz (Altana Trade), Mikheil Nadareishvili (TBC Bank)

From analyzing risk and detecting fraud to predicting payments and improving customer experience, take a deep dive into the ways data technologies are transforming the financial industry. Read more.

Building and leading a successful AI practice for your organization

9:00am–12:30pm Tuesday, September 24, 2019

Tutorial

Executive Briefing and best practices, Strata Business Summit

Secondary topics: Culture and Organization

Rossella Blatt Vital (Wonderlic), Ross Piper (Wonderlic), Daniel Schmerling (Wonderlic)

Creating and leading a successful ML strategy is an elegant orchestration of many components: master key ML concepts, operationalize ML workflow, prioritize highest-value projects, build a high-performing team, nurture strategic partnerships, align with the company’s mission, etc. Rossella Blatt Vital details insights and lessons learned in how to create and lead a flourishing ML practice. Read more.

Efficient ML engineering: Tools and best practices

9:00am–12:30pm Tuesday, September 24, 2019

Tutorial

Data Science, Machine Learning, & AI

Secondary topics: Culture and Organization, Model Development, Governance, Operations

Sourav Dey (Manifold), Jakov Kucan (Manifold)

Sourav Dey and Jakov Kucan walk you through the six steps of the Lean AI process and explain how it helps your ML engineers work as an an integrated part of your development and production teams. You'll get a hands-on example using real-world data, so you can get up and running with Docker and Orbyter and see firsthand how streamlined they can make your workflow. Read more.

Introduction to natural language processing in Python

9:00am–12:30pm Tuesday, September 24, 2019

Tutorial

Data Science, Machine Learning, & AI

Secondary topics: Text and Language processing and analysis

Alice Zhao (Metis)

As a data scientist, we are known to crunch numbers, but you need to decide what to do when you run into text data. Alice Zhao walks you through the steps to turn text data into a format that a machine can understand, explores some of the most popular text analytics techniques, and showcases several natural language processing (NLP) libraries in Python, including NLTK, TextBlob, spaCy, and gensim. Read more.

Learning Presto: SQL on anything

9:00am–12:30pm Tuesday, September 24, 2019

Tutorial

Data Engineering and Architecture

Secondary topics: BI, Interactive Analytics and Visualization, Data Management and Storage, Deep dive into specific tools, platforms, or frameworks

Matt Fuller (Starburst)

Used by Facebook, Netflix, Airbnb, LinkedIn, Twitter, Uber, and others, Presto has become the ubiquitous open source software for SQL on anything. Presto was built from the ground up for fast interactive SQL analytics against disparate data sources ranging in size from GBs to PBs. Join Matt Fuller to learn how to use Presto and explore use cases and best practices you can implement today. Read more.

Serverless streaming architectures and algorithms for the enterprise

9:00am–12:30pm Tuesday, September 24, 2019

Tutorial

Data Engineering and Architecture, Streaming and IoT

Secondary topics: Cloud Platforms and SaaS, Data, Analytics, and AI Architecture, Streaming and IoT, Temporal data and time-series analytics

Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio), Anurag Khandelwal (Yale University)

Arun Kejariwal, Karthik Ramasamy, and Anurag Khandelwal walk you through the landscape of streaming systems and examine the inception and growth of the serverless paradigm. You'll take a deep dive into Apache Pulsar, which provides native serverless support in the form of Pulsar functions and get a bird’s-eye view of the application domains where you can leverage Pulsar functions. Read more.

Real-time SQL stream processing at scale with Apache Kafka and KSQL

9:00am–12:30pm Tuesday, September 24, 2019

Tutorial

Data Engineering and Architecture

Secondary topics: Data Integration and Data Processing, Deep dive into specific tools, platforms, or frameworks, Streaming and IoT

Viktor Gamov (Confluent)

Building stream processing applications is certainly one of the hot topics in the IT community. But if you've ever thought you needed to be a programmer to do stream processing and build stream processing data pipelines, think again. Viktor Gamov explores KSQL, the stream processing query engine built on top of Apache Kafka. Read more.

Cloudera Edge Management in the IoT

9:00am–12:30pm Tuesday, September 24, 2019

Tutorial

Data Engineering and Architecture, Streaming and IoT

Secondary topics: Deep dive into specific tools, platforms, or frameworks, Streaming and IoT

Purnima Reddy Kuchikulla (Cloudera), Timothy Spann (Cloudera), Abdelkrim Hadjidj (Cloudera), Andre Araujo (Cloudera), Hemanth Yamijala (Cloudera)

There are too many edge devices and agents, and you need to control and manage them. Purnima Reddy Kuchikulla, Timothy Spann, Abdelkrim Hadjidj, and Andre Araujo walk you through handling the difficulty in collecting real-time data and the trouble with updating a specific set of agents with edge applications. Get your hands dirty with CEM, which addresses these challenges with ease. Read more.

Deep learning from scratch

9:00am–12:30pm Tuesday, September 24, 2019

Tutorial

Data Science, Machine Learning, & AI

Secondary topics: Deep Learning

Bruno Goncalves (Data For Science)

You'll go hands-on to learn the theoretical foundations and principal ideas underlying deep learning and neural networks. Bruno Gonçalves provides the code structure of the implementations that closely resembles the way Keras is structured, so that by the end of the course, you'll be prepared to dive deeper into the deep learning applications of your choice. Read more.

Running multidisciplinary big data workloads in the cloud with CDP

9:00am–12:30pm Tuesday, September 24, 2019

Tutorial

Data Engineering and Architecture

Secondary topics: Cloud Platforms and SaaS, Data Management and Storage

James Morantus (Cloudera), Tony Huinker (Cloudera), Naren Koneru (Cloudera), Ramachandran Venkatesh (Cloudera), Gunther Hagleitner (Cloudera), Olli Draese (Cloudera)

Organizations now run diverse, multidisciplinary, big data workloads that span data engineering, data warehousing, and data science applications. Many of these workloads operate on the same underlying data, and the workloads themselves can be transient or long running in nature. There are many challenges with moving these workloads to the cloud. In this talk we start off with a technical deep... Read more.

Getting ready for CCPA: Securing data lakes for heavy privacy regulation

9:00am–12:30pm Tuesday, September 24, 2019

Tutorial

Security and Privacy

Secondary topics: Privacy and Security

Mark Donsky (Okera), Lars George (Okera), Michael Ernest (Dataiku), Ifigeneia Derekli (Cloudera)

New regulations drive compliance, governance, and security challenges for big data. Infosec and security groups must ensure a secured and governed environment across workloads that span on-premises, private cloud, multicloud, and hybrid cloud. Mark Donsky, Lars George, Michael Ernest, and Ifigeneia Derekli outline hands-on best practices for meeting these challenges with special attention to CCPA. Read more.

SOLD OUT: Managing the complete machine learning lifecycle with MLflow

9:00am–12:30pm Tuesday, September 24, 2019

Tutorial

Data Science, Machine Learning, & AI

Secondary topics: Model Development, Governance, Operations

Jules Damji (Databricks)

ML development brings many new complexities beyond the software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools, and parameters to get the best results, and they need to track this information. Jules Damji walks you through MLflow, an open source project that simplifies the entire ML lifecycle, to solve this problem. Read more.

10:30am

10:30am–11:00am Tuesday, September 24, 2019

Morning break sponsored by Microsoft (30m)

12:30pm

12:30pm–1:30pm Tuesday, September 24, 2019

Lunch (1h)

1:30pm

Managing data science in the enterprise

1:30pm–5:00pm Tuesday, September 24, 2019

Tutorial

Executive Briefing and best practices, Strata Business Summit

Secondary topics: Culture and Organization

Alexander Izydorczyk (Coatue Managment), Benjamin Singleton (JetBlue), Joshua Poduska (Domino Data Lab)

The honeymoon era of data science is ending and accountability is coming. Not content to wait for results that may or may not arrive, successful data science leaders must deliver measurable impact on an increasing share of an enterprise’s KPIs. The speakers explore how leading organizations take a holistic approach to people, process, and technology to build a sustainable advantage. Read more.

Deep learning methods for natural language processing

1:30pm–5:00pm Tuesday, September 24, 2019

Tutorial

Data Science, Machine Learning, & AI

Secondary topics: Deep Learning, Financial Services, Text and Language processing and analysis

Garrett Hoffman (StockTwits)

Garrett Hoffman walks you through deep learning methods for natural language processing and natural language understanding tasks, using a live example in Python and TensorFlow with StockTwits data. Methods include Word2Vec, recurrent neural networks (RNNs) and variants (long short-term memory [LSTM] and gated recurrent unit [GRU]), and convolutional neural networks. Read more.

Natural language understanding at scale with Spark NLP

1:30pm–5:00pm Tuesday, September 24, 2019

Tutorial

Data Science, Machine Learning, & AI

Secondary topics: Deep dive into specific tools, platforms, or frameworks, Text and Language processing and analysis

David Talby (Pacific AI), Alex Thomas (John Snow Labs), Saif Addin Ellafi (John Snow Labs), Claudiu Branzan (Accenture)

David Talby, Alex Thomas, Saif Addin Ellafi, and Claudiu Branzan walk you through state-of-the-art natural language processing (NLP) using the highly performant, highly scalable open source Spark NLP library. You'll spend about half your time coding as you work through four sections, each with an end-to-end working codebase that you can change and improve. Read more.

Apache Metron: Open source cybersecurity at scale

1:30pm–5:00pm Tuesday, September 24, 2019

Tutorial

Security and Privacy

Secondary topics: Privacy and Security

Carolyn Duby (Cloudera), Madhan Neethiraj (Cloudera), Michael Gregory (Cloudera), Sangeeta Doraiswamy (cloudera)

Bring your laptop, roll up your sleeves, and get ready to crunch some cybersecurity events with Apache Metron, an open source big data cybersecurity platform. Carolyn Duby walks you through how Metron finds actionable events in real time. Read more.

From relational databases to cloud databases: Using the right tool for the right job

1:30pm–5:00pm Tuesday, September 24, 2019

Tutorial

Data Engineering and Architecture

Secondary topics: BI, Interactive Analytics and Visualization, Cloud Platforms and SaaS, Data Management and Storage, Data, Analytics, and AI Architecture

Gowrishankar Balasubramanian (Amazon Web Services), Rajeev Srinivasan (Amazon Web Services)

Enterprises adopt cloud platforms such as AWS for agility, elasticity, and cost savings. Database design and management requires a different mindset in AWS when compared to traditional RDBMS design. Gowrishankar Balasubramanian and Rajeev Srinivasan explore considerations in choosing the right database for your use case and access pattern while migrating or building a new application on the cloud. Read more.

Foundations for successful data projects

1:30pm–5:00pm Tuesday, September 24, 2019

Tutorial

Data Engineering and Architecture

Secondary topics: Culture and Organization

Ted Malaska (Capital One), Jonathan Seidman (Cloudera), Matthew Schumpert (Cloudera, Inc.), Raman Rajasekhar (Cloudera Inc), Krishna Maheshwari (Cloudera)

The enterprise data management space has changed dramatically in recent years, and this has led to new challenges for organizations in creating successful data practices. Ted Malaska and Jonathan Seidman detail guidelines and best practices from planning to implementation based on years of experience working with companies to deliver successful data projects. Read more.

Sketching data and other magic tricks

1:30pm–5:00pm Tuesday, September 24, 2019

Tutorial

Data Science, Machine Learning, & AI

Secondary topics: Streaming and IoT, Temporal data and time-series analytics

Sophie Watson (Red Hat), William Benton (Red Hat)

Go hands-on with Sophie Watson and William Benton to examine data structures that let you answer interesting queries about massive datasets in fixed amounts of space and constant time. This seems like magic, but they'll explain the key trick that makes it possible and show you how to use these structures for real-world machine learning and data engineering applications. Read more.

Architecting a data platform for enterprise use

1:30pm–5:00pm Tuesday, September 24, 2019

Tutorial

Data Engineering and Architecture

Secondary topics: BI, Interactive Analytics and Visualization, Cloud Platforms and SaaS, Data, Analytics, and AI Architecture

Mark Madsen (Teradata), Todd Walter (Archimedata)

Building a data lake involves more than installing Hadoop or putting data into AWS. The goal in most organizations is to build a multiuse data infrastructure that isn't subject to past constraints. Mark Madsen and Todd Walter explore design assumptions and principles and walk you through a reference architecture to use as you work to unify your analytics infrastructure. Read more.

Kafka and Streams Messaging Manager (SMM) crash course

1:30pm–5:00pm Tuesday, September 24, 2019

Tutorial

Data Engineering and Architecture, Streaming and IoT

Secondary topics: Deep dive into specific tools, platforms, or frameworks, Streaming and IoT

Purnima Reddy Kuchikulla (Cloudera), Dan Chaffelson (Cloudera), Attila Kanto (Cloudera), Tony Wu (Cloudera)

Kafka is omnipresent and the backbone of streaming analytics applications and data lakes. The challenge is understanding what's going on overall in the Kafka cluster, including performance, issues, and message flows. Purnima Reddy Kuchikulla and Dan Chaffelson walk you through a hands-on experience to visualize the entire Kafka environment end-to-end and simplify Kafka operations via SMM. Read more.

Hands-on machine learning with Kafka-based streaming pipelines

1:30pm–5:00pm Tuesday, September 24, 2019

Tutorial

Data Engineering and Architecture

Secondary topics: Model Development, Governance, Operations

Boris Lublinsky (Lightbend), Dean Wampler (Anyscale)

Boris Lublinsky and Dean Wampler examine ML use in streaming data pipelines, how to do periodic model retraining, and low-latency scoring in live streams. Learn about Kafka as the data backplane, the pros and cons of microservices versus systems like Spark and Flink, tips for TensorFlow and SparkML, performance considerations, metadata tracking, and more. Read more.

Building a recommender system with Amazon ML services

1:30pm–5:00pm Tuesday, September 24, 2019

Tutorial

Data Science, Machine Learning, & AI

Secondary topics: Cloud Platforms and SaaS, Deep dive into specific tools, platforms, or frameworks

Karthik Sonti (Amazon Web Services), Emily Webber (Amazon Web Services), Varun Rao Bhamidimarri (Amazon Web Services)

Karthik Sonti, Emily Webber, and Varun Rao Bhamidimarri introduce you to the Amazon SageMaker machine learning platform and provide a high-level discussion of recommender systems. You'll dig into different machine learning approaches for recommender systems, including common methods such as matrix factorization as well as newer embedding approaches. Read more.

3:00pm

3:00pm–3:30pm Tuesday, September 24, 2019

Afternoon break sponsored by Dataiku (30m)

5:00pm

Opening Reception

5:00pm–6:30pm Tuesday, September 24, 2019

Event

Enjoy delicious snacks and beverages with fellow Strata attendees, speakers, and sponsors at the Opening Reception, happening immediately after tutorials on Tuesday. Read more.

Wednesday, 09/25/2019

8:00am

Speed Networking

8:00am–8:30am Wednesday, September 25, 2019

Event

Gather before keynotes on Wednesday morning to enjoy casual conversation while meeting fellow attendees. If one of your goals at Strata is to meet new people, this session will jumpstart your networking with other attendees. Read more.

8:30am

8:30am–8:45am Wednesday, September 25, 2019

Early morning coffee (8:00am - 8:45am) (15m)

8:45am

Wednesday keynotes

8:45am–8:50am Wednesday, September 25, 2019

Keynote

Ben Lorica (O'Reilly), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)

Program chairs Ben Lorica, Doug Cutting, and Alistair Croll welcome you to the first day of keynotes. Read more.

8:50am

The road to an enterprise cloud

8:50am–9:05am Wednesday, September 25, 2019

Keynote

Mick Hollison (Cloudera), Hillery Hunter (IBM)

Learn how IBM and Cloudera are fueling innovation in IoT, streaming, data warehouse and machine learning, and making their customer’s digital transformation journey easier, faster and safer. Read more.

9:05am

Recent trends in data and machine learning technologies

9:05am–9:15am Wednesday, September 25, 2019

Keynote

Ben Lorica (O'Reilly)

Ben Lorica dives into emerging technologies for building data infrastructures and machine learning platforms. Read more.

9:15am

Everything is connected and the clock is ticking: AI and big ag data for food security

9:15am–9:30am Wednesday, September 25, 2019

Keynote

Sara Menker (Gro Intelligence), Nemo Semret (Gro Intelligence)

Sara Menker, CEO, Gro Intelligence Read more.

9:30am

The future of Google Cloud data processing (sponsored by Google Cloud)

9:30am–9:40am Wednesday, September 25, 2019

Keynote

James Malone (Google)

Open source has always been a core pillar of Google Cloud’s data and analytics strategy. James Malone examines how, as the community continues to set industry standards, the company continues to integrate those standards into its services so organizations around the world can unlock the value of data faster. Read more.

9:40am

AI isn't magic. It’s computer science.

9:40am–10:00am Wednesday, September 25, 2019

Keynote

Robert Thomas (IBM), Tim O'Reilly (O'Reilly Media)

AI has the potential to add $16 trillion global economy by 2030, but adoption has been slow. While we understand the power of AI, many of us aren’t sure how to fully unleash its potential. Join Robert Thomas and Tim O'Reilly to learn that the reality is AI isn't magic. It’s hard work. Read more.

10:00am

Unleash the power of data at scale (sponsored by Intel)

10:00am–10:05am Wednesday, September 25, 2019

Keynote

Jeremy Rader (Intel)

Data analytics is the long-standing but constantly evolving science that companies leverage for insight, innovation, and competitive advantage. Jeremy Rader explores Intel’s end-to-end data pipeline software strategy designed and optimized for a modern and flexible data-centric infrastructure that allows for the easy deployment of unified advanced analytics and AI solutions at scale. Read more.

10:05am

How disruptive tech is reshaping the financial services industry

10:05am–10:20am Wednesday, September 25, 2019

Keynote

Swatee Singh (American Express)

The financial services industry is increasingly using disruptive technology—including AI and machine learning, edge computing, blockchain, mobile and mixed reality, virtual assistants, and quantum computing to name a few—to enhance the customer experience and personalize their interactions with customers. Swatee Singh outlines how the same is true at American Express. Read more.

10:20am

It’s not you; it’s your database: How to unlock the full potential of your operational data (sponsored by MemSQL)

10:20am–10:25am Wednesday, September 25, 2019

Keynote

Nikita Shamgunov (MemSQL)

Data is now the world’s most valuable resource, with winners and losers decided every day by how well we collect, analyze, and act on data. However, most companies struggle to unlock the full value of their data, using outdated, outmoded data infrastructure. Nikita Shamgunov examines how businesses use data, the new demands on data infrastructure, and what you should expect from your tools. Read more.

10:25am

Cisco Data Intelligence Platform (sponsored by Cisco)

10:25am–10:30am Wednesday, September 25, 2019

Keynote

Siva Sivakumar (Cisco)

Siva Sivakumar explains the Cisco Data Intelligence Platform (CDIP), which is a cloud-scale architecture that brings together big data, AI and compute farm, and storage tiers to work together as a single entity, while also being able to scale independently to address the IT issues in the modern data center. Read more.

10:30am

Interactive sports analytics

10:30am–10:45am Wednesday, September 25, 2019

Keynote

Patrick Lucey (Stats Perform)

Imagine watching sports and being able to immediately find all plays that are similar to what just happened. Better still, imagine being able to draw a play with the Xs and Os on an interface like a coach draws on a chalkboard and instantaneously finding all the similar plays and conduct analytics on those plays. Join Patrick Lucey to see how this is possible. Read more.

10:50am

10:50am–11:20am Wednesday, September 25, 2019

Morning break sponsored by Intel (30m)

11:20am

Mass migration: Tales of moving on-premises Hadoop to Google Cloud (sponsored by Google Cloud)

11:20am–12:00pm Wednesday, September 25, 2019

Session

Sponsored

James Malone (Google)

James Malone takes a deep dive into how customers across the world partner with Google Cloud to reimagine big data processing and data lakes while generating incredible business value. Read more.

Navigating the Transition to a Data First Enterprise: an Intel perspective (sponsored by Intel)

11:20am–12:00pm Wednesday, September 25, 2019

Session

Sponsored

Jeremy Rader (Intel)

This session will reveal first-hand insights of an Intel analytics practitioner, share Intel IT’s own data maturity journey and provide actionable best known methods (BKMs) for Enterprises amidst transformation into an intelligent data-first business. Read more.

Building a fast, scalable, efficient operational analytics and reporting application using MemSQL, Docker, Airflow, and Prometheus (sponsored by MemSQL)

11:20am–12:00pm Wednesday, September 25, 2019

Session

Sponsored

Praveen Chitrada (Akamai Technologies)

Praveen Chitrada walks you through how Akamai uses MemSQL, Docker, Airflow, Prometheus, and other technologies as an enabler to streamline and accelerate data ingestion and calculation to generate usage metrics for billing, reporting, and analytics at massive scale. Read more.

Building a multitenant data processing and model inferencing platform with Kafka Streams

11:20am–12:00pm Wednesday, September 25, 2019

Session

Data Engineering and Architecture

Secondary topics: Data Integration and Data Processing, Data, Analytics, and AI Architecture, Retail and e-commerce, Streaming and IoT

Navinder Pal Singh Brar (Walmart Labs)

Each week 275 million people shop at Walmart, generating interaction and transaction data. Navinder Pal Singh Brar explains how the customer backbone team enables extraction, transformation, and storage of customer data to be served to other teams. At 5 billion events per day, the Kafka Streams cluster processes events from various channels and maintains a uniform identity of a customer. Read more.

Operationalizing AI and ML with Cisco Data Intelligence Platform (sponsored by Cisco)

11:20am–12:00pm Wednesday, September 25, 2019

Session

Sponsored

Chiang Yang (Cisco), Karthik Kulkarni (Cisco)

Artificial intelligence and machine learning are well beyond the laboratory exploratory stage of deployment. In fact, the speed of AI and ML deployment has a huge impact on an organization’s financial income. Chiang Yang and Karthik Kulkarni explore how the Cisco Data Intelligence Platform can help bridge the gap between AI and ML and big data. Read more.

Practical feature engineering

11:20am–12:00pm Wednesday, September 25, 2019

Session

Data Science, Machine Learning, & AI

Ted Dunning (MapR, now part of HPE)

Feature engineering is generally the section that gets left out of machine learning books, but it's also the most critical part in practice. Ted Dunning explores techniques, a few well known, but some rarely spoken of outside the institutional knowledge of top teams, including how to handle categorical inputs, natural language, transactions, and more in the context of machine learning. Read more.

Scaling data engineers

11:20am–12:00pm Wednesday, September 25, 2019

Session

Data Engineering and Architecture

Secondary topics: Culture and Organization, Financial Services, Model Development, Governance, Operations

Evgeny Vinogradov (Yandex.Money)

With a microservice architecture, a data warehouse is the first place where all the data meets. It's supplied by many different data sources and used for many purposes—from near-online transactional processing (OLTP) to model fitting and real-time classifying. Evgeny Vinogradov details his experience in managing and scaling data for support of 20+ product teams. Read more.

Building an AI platform: Key principles and lessons learned

11:20am–12:00pm Wednesday, September 25, 2019

Session

Automation in data science and data, Data Engineering and Architecture

Secondary topics: Data, Analytics, and AI Architecture

Moty Fania (Intel)

Moty Fania details Intel’s IT experience of implementing a sales AI platform. This platform is based on streaming, microservices architecture with a message bus backbone. It was designed for real-time data extraction and reasoning and handles the processing of millions of website pages and is capable of sifting through millions of tweets per day. Read more.

Data security and privacy anti-patterns

11:20am–12:00pm Wednesday, September 25, 2019

Session

Data Engineering and Architecture, Security and Privacy

Secondary topics: Data Management and Storage, Privacy and Security

Steven Touw (Immuta)

Anti-patterns are behaviors that take bad problems and lead to even worse solutions. In the world of data security and privacy, they’re everywhere. Over the past four years, data security and privacy anti-patterns have emerged across hundreds of customers and industry verticals—there's been an obvious trend. Steven Touw details five anti-patterns and, more importantly, the solutions for them. Read more.

Improve your data science ROI with a portfolio and risk management lens

11:20am–12:00pm Wednesday, September 25, 2019

Session

Executive Briefing and best practices, Strata Business Summit

Secondary topics: Culture and Organization

Brian Dalessandro (Capital One)

While data science value is well recognized within tech, experience across industries shows that the ability to realize and measure business impact is not universal. A core issue is that data science programs face unique risks many leaders aren’t trained to hedge against. Brian Dalessandro addresses these risks and advocates for new ways to think about and manage data science programs. Read more.

Embrace complexity: The new rules of AI

11:20am–12:00pm Wednesday, September 25, 2019

Session

Strata Business Summit

Janet Haven (Data & Society)

Join Data & Society Research Institute Executive Director Janet Haven for a deep dive into research, case studies and emerging governance approaches to creating the rules of ethical AI. Read more.

Unified tooling for machine learning interpretability

11:20am–12:00pm Wednesday, September 25, 2019

Session

Data Science, Machine Learning, & AI, Expo Hall

Secondary topics: Ethics

Harsha Nori (Microsoft), Samuel Jenkins (Microsoft), Rich Caruana (Microsoft)

Understanding decisions made by machine learning systems is critical for sensitive uses, ensuring fairness, and debugging production models. Interpretability presents options for trying to understand model decisions. Harsha Nori, Sameul Jenkins, and Rich Caruana explore the tools Microsoft is releasing to help you train powerful, interpretable models and interpret existing black box systems. Read more.

Lightning-fast time series modeling and prediction: (S)ARIMA on steroids

11:20am–12:00pm Wednesday, September 25, 2019

Session

Data Science, Machine Learning, & AI

Secondary topics: Temporal data and time-series analytics

Meir TOLEDANO (Anodot)

ARIMA has been used for time series modeling for decades. In practice, most time series collected from human activities exhibit seasonal patterns, but the efficient estimation of seasonal ARIMA ((S)ARIMA) models was inefficient for decades. Meir Toledano explains how Anodot was able to apply the technique for forecasting and anomaly detection for millions of time series every day. Read more.

We run, we improve, we scale: The XGBoost story at Uber

11:20am–12:00pm Wednesday, September 25, 2019

Session

Data Science, Machine Learning, & AI

Secondary topics: Deep dive into specific tools, platforms, or frameworks, Transportation and Logistics

Nan Zhu (Uber), Felix Cheung (Uber)

XGBoost has been widely deployed in companies across the industry. Nan Zhu and Felix Cheung dive into the internals of distributed training in XGBoost and demonstrate how XGBoost resolves the business problem in Uber with a scale to thousands of workers and tens of TB of training data. Read more.

Kubernetes for stateful MPP systems

11:20am–12:00pm Wednesday, September 25, 2019

Session

Data Engineering and Architecture

Secondary topics: Cloud Platforms and SaaS, Data Management and Storage, Data, Analytics, and AI Architecture

Paige Roberts (Vertica), Deepak Majeti (Vertica)

GoodData needed to autorecover from node failures and scale rapidly when workloads spiked on their MPP database in the cloud. Kubernetes could solve it, but it's for stateless microservices, not a stateful MPP database that needs hundreds of containers. Paige Roberts and Deepak Majeti detail the hurdles GoodData needed to overcome in order to merge the power of the database with Kubernetes. Read more.

Executive Briefing: Why machine-learned models crash and burn in production and what to do about it

11:20am–12:00pm Wednesday, September 25, 2019

Session

Strata Business Summit

Secondary topics: Model Development, Governance, Operations

David Talby (Pacific AI)

Machine learning and data science systems often fail in production in unexpected ways. David Talby outlines real-world case studies showing why this happens and explains what you can do about it, covering best practices and lessons learned from a decade of experience building and operating such systems at Fortune 500 companies across several industries. Read more.

The future? Data, AI, and multicloud: It’s time to modernize (sponsored by IBM)

11:20am–12:00pm Wednesday, September 25, 2019

Session

Sponsored

Madhu Kochar (IBM)

An economic revolution is underway, driven by advancements in AI and multicloud technologies. Businesses are crafting strategic plans to modernize their data architecture for this emerging reality, and at the top of their wish list is the ability to virtualize all their data regardless of where it lives. Madhu Kochar explores the data advancements on the horizon. Read more.

12:00pm

12:00pm–1:15pm Wednesday, September 25, 2019

Lunch sponsored by Google Cloud (1h 15m)

Wednesday Business Summit Lunch

12:00pm–1:15pm Wednesday, September 25, 2019

Event

Join fellow executives, business leaders, and strategists for a networking lunch on Wednesday for Strata Business Summit attendees and speakers. Read more.

Wednesday Topic Tables at Lunch

12:00pm–1:15pm Wednesday, September 25, 2019

Event

Topic Table discussions are a great way to informally network with people in similar industries or interested in the same topics. Read more.

Better Together Diversity Networking Lunch

12:00pm–1:15pm Wednesday, September 25, 2019

Event

If you’d like to make new professional connections and hear ideas for supporting diversity in the tech community, come to the diversity and inclusion networking lunch on Wednesday. Read more.

12:30pm

10 things to know about running and migrating Hadoop to GCP (sponsored by Google Cloud)

12:30pm–1:10pm Wednesday, September 25, 2019

Session

Blake DuBois (Google)

Taking advantage of cloud infrastructure and analytic services is a must for any digital enterprise. Join Google Cloud as they discuss 10 things you should know about running and migrating on-prem Hadoop deployments to GCP. Read more.

1:15pm

The ugly truth about making analytics actionable (sponsored by SAS)

1:15pm–1:55pm Wednesday, September 25, 2019

Session

Sponsored

Diana Shaw (SAS)

Companies today are working to adopt data-driven mind-sets, strategies, and cultures. Yet the ugly truth is many still struggle to make analytics actionable. Diana Shaw outlines a simple, powerful, and automated solution to operationalize all types of analytics at scale. You'll learn how to put analytics into action while providing model governance and data scalability to drive real results. Read more.

Low-latency computing and stream processing for financial systems (sponsored by Hazelcast)

1:15pm–1:55pm Wednesday, September 25, 2019

Session

Sponsored

John DesJardins (Hazelcast)

In this talk, we will explore the challenges with integrating real-time stream processing and machine learning into banking and capital markets applications. Read more.

Running AI workloads in containers (sponsored by BMC Software)

1:15pm–1:55pm Wednesday, September 25, 2019

Sponsored

See-Kit Lam (Malwarebytes), Darren Chinen (Malwarebytes)

Developing, deploying and managing AI and anomaly detection models is tough business. See-Kit Lam details how Malwarebytes has leveraged containerization, scheduling, and orchestration to build a behavioral detection platform and a pipeline to bring models from concept to production. Read more.

Now you see me; now you compute: Building event-driven architectures with Apache Kafka

1:15pm–1:55pm Wednesday, September 25, 2019

Session

Data Engineering and Architecture

Secondary topics: Data, Analytics, and AI Architecture, Deep dive into specific tools, platforms, or frameworks

Michael Noll (Confluent)

Would you cross the street with traffic information that's a minute old? Certainly not. Modern businesses have the same needs. Michael Noll explores why and how you can use Kafka and its growing ecosystem to build elastic event-driven architectures. Specifically, you look at Kafka as the storage layer, at Kafka Connect for data integration, and at Kafka Streams and KSQL as the compute layer. Read more.

Data science isn't just another job (sponsored by Anaconda)

1:15pm–1:55pm Wednesday, September 25, 2019

Session

Sponsored

Peter Wang (Anaconda)

Peter Wang explores why data science shouldn’t be seen as merely another technical job within the business and why open source is such a critical aspect of innovation in the field of data science. Read more.

Learning with limited labeled data

1:15pm–1:55pm Wednesday, September 25, 2019

Session

Data Science, Machine Learning, & AI

Secondary topics: Deep Learning

Shioulin Sam (Cloudera Fast Forward Labs)

Supervised machine learning requires large labeled datasets—a prohibitive limitation in many real world applications. But this could be avoided if machines could earn with a few labeled examples. Shioulin Sam explores and demonstrates an algorithmic solution that relies on collaboration between human and machine to label smartly, and she outlines product possibilities. Read more.

A productive data science platform: Beyond a hosted-notebooks solution at LinkedIn

1:15pm–1:55pm Wednesday, September 25, 2019

Session

Data Engineering and Architecture

Secondary topics: Data, Analytics, and AI Architecture, Media and Advertising

Swasti Kakker (LinkedIn), Manu Ram Pandit (LinkedIn), Vidya Ravivarma (LinkedIn)

Join Swasti Kakker, Manu Ram Pandit, and Vidya Ravivarma to explore what's offered by a flexible and scalable hosted data science platform at LinkedIn. It provides features to seamlessly develop in multiple languages, enforce developer best practices, governance policies, execute, visualize solutions, efficient knowledge management, and collaboration to improve developer productivity. Read more.

Sharing is caring: Using Egeria to establish true enterprise metadata governance

1:15pm–1:55pm Wednesday, September 25, 2019

Session

Data Engineering and Architecture

Secondary topics: Data quality, data governance and data lineage, Deep dive into specific tools, platforms, or frameworks

Wim Stoop (Cloudera), Srikanth Venkat (Cloudera)

Establishing enterprise-wide security and governance remains a challenge for most organizations. Integrations and exchanges across the landscape are costly to manage and maintain, and typically work in one direction only. Wim Stoop and Srikanth Venkat explore how ODPi's Egeria standard and framework removes the challenges and is leveraged by Cloudera and partners alike to deliver value. Read more.

Parquet modular encryption: Confidentiality and integrity of sensitive column data

1:15pm–1:55pm Wednesday, September 25, 2019

Session

Data Engineering and Architecture, Security and Privacy

Secondary topics: Deep dive into specific tools, platforms, or frameworks, Health and Medicine, Privacy and Security

Gidon Gershinsky (IBM)

The Apache Parquet community is working on a column encryption mechanism that protects sensitive data and enables access control for table columns. Many companies are involved, and the mechanism specification has recently been signed off on by the community management committee. Gidon Gershinsky explores the basics of Parquet encryption technology, its usage model, and a number of use cases. Read more.

Turning petabytes of data from millions of vehicles into open data with Geotab

1:15pm–1:55pm Wednesday, September 25, 2019

Session

Case studies, Strata Business Summit

Secondary topics: BI, Interactive Analytics and Visualization, Cloud Platforms and SaaS, Streaming and IoT

Felipe Hoffa (Google), Bob Bradley (Geotab)

Geotab is a world-leading asset-tracking company with millions of vehicles under service every day. Felipe Hoffa and Bob Bradley examine the challenges and solutions to create an ML- and geographic information system- (GI)S enabled petabyte-scale data warehouse leveraging Google Cloud. And they dive into the process to publish open, how you can access it, and how cities are using it. Read more.

War stories from the front lines of ML

1:15pm–1:55pm Wednesday, September 25, 2019

Session

Law and Ethics, Strata Business Summit

Secondary topics: Ethics, Privacy and Security

Andrew Burt (bnh.ai), Brenda Leong (Future of Privacy Forum), David Florsek (IDEMIA NSS), Alex Beutel (Google Brain), Chris Wheeler (Mastercard)

Machine learning techniques are being deployed across almost every industry and sector. But this adoption comes with real, and oftentimes underestimated, privacy and security risks. Andrew Burt and Brenda Leong convene a panel of experts including David Florsek, Chris Wheeler, and Alex Beutel to detail real-life examples of when ML goes wrong, and the lessons they learned. Read more.

Feature engineering with Spark NLP to accelerate clinical trial recruitment

1:15pm–1:55pm Wednesday, September 25, 2019

Session

Data Science, Machine Learning, & AI, Expo Hall

Secondary topics: Health and Medicine, Text and Language processing and analysis

Saif Addin Ellafi (John Snow Labs), Scott Hoch (BlackBox Engineering)

Recruiting patients for clinical trials is a major challenge in drug development. Saif Addin Ellafi and Scott Hoch explain how Deep 6 uses Spark NLP to scale its training and inference pipelines to millions of patients while achieving state-of-the-art accuracy. They dive into the technical challenges, the architecture of the full solution, and the lessons the company learned. Read more.

Improving OCR quality of documents using generative adversarial networks

1:15pm–1:55pm Wednesday, September 25, 2019

Session

Data Science, Machine Learning, & AI

Secondary topics: Deep Learning, Financial Services, Health and Medicine

Nagendra Shishodia (EXL), Chaithanya Manda (EXL), Solmaz Torabi (EXL)

Every NLP-based document-processing solution depends on converting scanned documents and images to machine readable text using an OCR solution, limited by the quality of scanned images. Nagendra Shishodia, Chaithanya Manda, and Solmaz Torabi explore how GAN can bring significant efficiencies in any document-processing solution by enhancing resolution and denoising scanned images. Read more.

Machine learning and large-scale data analysis on a centralized platform

1:15pm–1:55pm Wednesday, September 25, 2019

Session

Data Science, Machine Learning, & AI

Secondary topics: Data, Analytics, and AI Architecture, Financial Services, Retail and e-commerce

James Tang (Walmart Labs), Yiyi Zeng (Walmart Labs), Linhong Kang (Walmart Labs)

James Tang, Yiyi Zeng, and Linhong Kang outline how Walmart provides a secure and seamless shopping experience through machine learning and large scale data analysis on centralized platform. Read more.

Your easy move to serverless computing and radically simplified data processing

1:15pm–1:55pm Wednesday, September 25, 2019

Session

Data Engineering and Architecture

Secondary topics: Cloud Platforms and SaaS, Data Integration and Data Processing

Gil Vernik (IBM)

Most analytic flows can benefit from serverless, starting with simple cases to and moving to complex data preparations for AI frameworks like TensorFlow. To address the challenge of how to easily integrate serverless without major disruptions to your system, Gil Vernik explores the “push to the cloud” experience, which dramatically simplifies serverless for big data processing frameworks. Read more.

Executive Briefing: Top 10 big data blunders

1:15pm–1:55pm Wednesday, September 25, 2019

Session

Executive Briefing and best practices, Strata Business Summit

Secondary topics: Culture and Organization, Data Management and Storage, Data, Analytics, and AI Architecture

Michael Stonebraker (Tamr)

As a steward for your enterprise’s data and digital transformation initiatives, you’re tasked with making the right choice. But before you can make those decisions, it’s important to understand what not to do when planning for your organization’s big data initiatives. Michael Stonebraker shares his top 10 big data blunders. Read more.

AI/ML on Oracle Cloud with Kinetica and H2O.ai (sponsored by Oracle Cloud Infrastructure)

1:15pm–1:55pm Wednesday, September 25, 2019

Session

Sponsored

Ben Lackey (Oracle)

Learn about running AI/ML solutions like H2O.ai and Kinetica on Oracle Cloud. The session will include a live demo of Terraform, Oracle Cloud Infrastructure, GPUs and Oracle Marketplace. We’ll discuss other leading Data and AI products including Cloudera, DataStax and Confluent. Read more.

2:05pm

Bringing together machine and human intelligence in business applications at enterprise scale (sponsored by SAP)

2:05pm–2:45pm Wednesday, September 25, 2019

Session

Sponsored

Kevin Poskitt (SAP), Andreas Wesselmann (SAP)

Oftentimes there's a fracture between the highly governed data of enterprise IT systems and the comprehensive but often ungoverned world of large-scale data lakes and streams of data from blogs, system logs, sensors, IoT devices, and more. Kevin Poskitt and Andreas Wesselmann walk you through how AI needs to connect to all of this data, as well as image, video, audio, and text data sources. Read more.

Solving for enterprise scale analytics and agile data operations (sponsored by Infoworks)

2:05pm–2:45pm Wednesday, September 25, 2019

Session

Sponsored

Amar Arsikere (infoworks.io)

The breakneck pace of business change and its insatiable appetite for data and analytics to drive Digital Transformation makes agile use of data an imperative. Read more.

ALDO’s data strategy to create the right customer experience for its consumers (sponsored by Talend)

2:05pm–2:45pm Wednesday, September 25, 2019

Session

Sponsored

Aaron Swanson (Talend)

Winning the hearts and minds of millennials and Gen Z is not an easy task. ALDO has devised a data-driven strategy to create the best consumer experience. Today ALDO relies on Talend and AWS. Aaron Swanson explains the choices made for its data architecture and the hurdles the teams had to solve to turn the vision into reality. Read more.

2:05pm–2:45pm Wednesday, September 25, 2019

TBC

Mastercard and Pitney Bowes: Creating a data-driven business (sponsored by Pitney Bowes)

2:05pm–2:45pm Wednesday, September 25, 2019

Session

Sponsored

Olga Lagunova (Pitney Bowes), John Derrico (Mastercard)

Mastercard and Pitney Bowes have overcome many challenges on their journey to accelerate innovation, achieve efficiencies, and improve the overall customer experience. Olga Lagunova and John Derrico share lessons learned as the data strategy evolved and highlight pitfalls and solutions from data science projects across several industries, from finance to cross-border shipping logistics. Read more.

Fair, privacy-preserving, and secure ML

2:05pm–2:45pm Wednesday, September 25, 2019

Session

Data Science, Machine Learning, & AI, Security and Privacy

Secondary topics: Ethics, Privacy and Security, Retail and e-commerce

Mikio Braun (Zalando)

With ML becoming more mainstream, the side effects of machine learning and AI on our lives become more visible. You have to take extra measures to make machine learning models fair and unbiased. And awareness for preserving the privacy in ML models is rapidly growing. Mikio Braun explores techniques and concepts around fairness, privacy, and security when it comes to machine learning models. Read more.

From raw data to informed intelligence: Democratizing data science and ML at Uber

2:05pm–2:45pm Wednesday, September 25, 2019

Session

Data Engineering and Architecture

Secondary topics: Data, Analytics, and AI Architecture, Transportation and Logistics

Atul Gupte (Uber)

Uber is changing the way people think about transportation. As an integral part of the logistical fabric in 65+ countries around the world, it uses ML and advanced data science to power every aspect of the Uber experience—from dispatch to customer support. Atul Gupte and Nikhil Joshi explore how Uber enables teams to transform insights into intelligence and facilitate critical workflows. Read more.

The evolution of metadata: LinkedIn’s story

2:05pm–2:45pm Wednesday, September 25, 2019

Session

Data Engineering and Architecture

Secondary topics: Data quality, data governance and data lineage, Media and Advertising

Shirshanka Das (LinkedIn), Mars Lan (LinkedIn)

Imagine scaling metadata to an organization of 10,000 employees, 1M+ data assets, and an AI-enabled company that ships code to the site three times a day. Shirshanka Das and Mars Lan dive into LinkedIn’s metadata journey from a two-person back-office team to a central hub powering data discovery, AI productivity, and automatic data privacy. They reveal metadata strategies and the battle scars. Read more.

Building a best-in-class data lake on AWS and Azure

2:05pm–2:45pm Wednesday, September 25, 2019

Session

Business Analytics and Visualization, Data Engineering and Architecture

Secondary topics: BI, Interactive Analytics and Visualization, Cloud Platforms and SaaS, Data Management and Storage

Tomer Shiran (Dremio), Jacques Nadeau (Dremio)

Data lakes have become a key ingredient in the data architecture of most companies. In the cloud, object storage systems such as S3 and ADLS make it easier than ever to operate a data lake. Tomer Shiran and Jacques Nadeau explain how you can build best-in-class data lakes in the cloud, leveraging open source technologies and the cloud's elasticity to run and optimize workloads simultaneously. Read more.

Executive Briefing: Usable machine learning—Lessons from Stanford and beyond

2:05pm–2:45pm Wednesday, September 25, 2019

Session

Strata Business Summit

Peter Bailis (Sisu | Stanford University)

Despite a meteoric rise in data volumes within modern enterprises, enabling nontechnical users to put this data to work in diagnostic and predictive tasks remains a fundamental challenge. Peter Bailis details the lessons learned in building new systems to help users leverage the data at their disposal, drawing on production experience from Facebook, Microsoft, and the Stanford DAWN project. Read more.

Regulations and the future of data

2:05pm–2:45pm Wednesday, September 25, 2019

Session

Security and Privacy, Strata Business Summit

Secondary topics: Ethics, Privacy and Security

Andrew Burt (bnh.ai), Brenda Leong (Future of Privacy Forum), Boris Segalis (Cooley), Susan Israel (Loeb & Loeb, LLP)

From the EU to California and China, more of the world is regulating how data can be used. Andrew Burt and Brenda Leong convene leading experts on law and data science for a deep dive into ways to regulate the use of AI and advanced analytics. Come learn why these laws are being proposed, how they’ll impact data, and what the future has in store. Read more.

Mind the semantic gap: How "talking semantics" can help you perform better data science

2:05pm–2:45pm Wednesday, September 25, 2019

Session

Data Science, Machine Learning, & AI, Expo Hall

Secondary topics: Text and Language processing and analysis

Panos Alexopoulos (Textkernel)

In an era where discussions among data scientists are monopolized by the latest trends in machine learning, the role of semantics in data science is often underplayed. Panos Alexopoulos presents real-world cases where making fine, seemingly pedantic, distinctions in the meaning of data science tasks and the related data has helped improve significantly the effectiveness and value. Read more.

Real-time anomaly detection on observability data using neural networks

2:05pm–2:45pm Wednesday, September 25, 2019

Session

Data Science, Machine Learning, & AI

Secondary topics: Deep Learning, Temporal data and time-series analytics, Transportation and Logistics

Keshav Peswani (Expedia Group), Ashish Aggarwal (Expedia Group)

Observability is the key in modern architecture to quickly detect and repair problems in microservices. Modern observability platforms have evolved beyond simple application logs and include distributed tracing systems like Zipkin and Haystack. Keshav Peswani and Ashish Aggarwal explore how combining them with real-time, intelligent alerting mechanisms helps in the automated detection of problems. Read more.

Data science versus engineering: Does it really have to be this way?

2:05pm–2:45pm Wednesday, September 25, 2019

Session

Data Science, Machine Learning, & AI

Secondary topics: Culture and Organization

Ann Spencer (Domino), Amy Heineike (Primer), Paco Nathan (derwen.ai), Chris Wiggins (NYT | Columbia)

If, as a data scientist, you've wondered why it takes so long to deploy your model into production or, as an engineer, thought data scientists have no idea what they want, you're not alone. Join a lively discussion with industry veterans Ann Spencer, Paco Nathan, Amy Heineike, and Chris Wiggins to find best practices or insights on increasing collaboration when developing and deploying models. Read more.

Orchestrating data workflows using a fully serverless architecture

2:05pm–2:45pm Wednesday, September 25, 2019

Session

Data Engineering and Architecture

Secondary topics: Cloud Platforms and SaaS, Data, Analytics, and AI Architecture

Tomer Levi (Fundbox)

Use of data workflows is a fundamental functionality of any data engineering team. Nonetheless, designing an easy-to-use, scalable, and flexible data workflow platform is a complex undertaking. Tomer Levi walks you through how the data engineering team at Fundbox uses AWS serverless technologies to address this problem and how it enables data scientists, BI devs, and engineers move faster. Read more.

Executive Briefing: Building a data-assisted organization

2:05pm–2:45pm Wednesday, September 25, 2019

Session

Strata Business Summit

Secondary topics: Culture and Organization, Financial Services

Arup Nanda (Capital One)

Every organization wants to use data more effectively and as a weapon, but few succeed. Arup Nanda explores how Priceline started on this journey and how it was successful using different techniques and tools. Join in to learn how to streamline data assets, make it easier for end users, define KPIs, create value from data, and build sponsorships to build a data organization. Read more.

2:55pm

Take the bias out of big data insights with augmented analytics (sponsored by Kyligence)

2:55pm–3:35pm Wednesday, September 25, 2019

Session

Sponsored

Dong Li (Kyligence), Hongbin Ma (Kyligence)

Your analytics are biased. Efforts to extract meaning by manually scrubbing, indexing, and parsing big data is limited by time, cost, and human assumptions. Dong Li and Hongbin Ma offer an overview of augmented analytics. It takes OLAP into the future with AI, ensuring objective and unique insights that cover all relevant scenarios found in petabytes of multidimensional and variable data. Read more.

Migrating Apache Spark and Hive from on-premises to Amazon EMR (sponsored by Amazon Web Services)

2:55pm–3:35pm Wednesday, September 25, 2019

Session

Sponsored

Radhika Ravirala (Amazon Web Services)

Radhika Ravirala explains how to migrate your workloads to Amazon EMR. Join in to learn the key motivations and benefits from a move to the cloud, along with the architectural changes required and best practices you can use right away. Read more.

Architecting a data analytics service both in the public cloud and in the on-premise private cloud: ETL, BI, and machine learning (sponsored by SK Holdings)

2:55pm–3:35pm Wednesday, September 25, 2019

Session

Sponsored

Jungwook SEo (SK Holdings)

Jungwook Seo walks you through a data analytics platform in the cloud by the name of AccuInsight+ with eight data analytic services in the CloudZ (one of the biggest cloud service providers in Korea), which SK Holdings announced in January 2019. Read more.

How Orange Financial combats financial fraud over 50M transactions a day using Apache Pulsar

2:55pm–3:35pm Wednesday, September 25, 2019

Session

Data Engineering and Architecture

Secondary topics: Data, Analytics, and AI Architecture, Financial Services, Streaming and IoT, Telecom

Weisheng Xie (Orange Financial), Jia Zhai (StreamNative)

As a fintech company of China Telecom with half of a billion registered users and 41 million monthly active users, risk control decision deployment has been critical to its success. Weisheng Xie and Jia Zhai explore how the company leverages Apache Pulsar to boost the efficiency of its risk control decision development for combating financial frauds of over 50 million transactions a day. Read more.

See what others can’t with spatial analysis and data science (sponsored by Esri)

2:55pm–3:35pm Wednesday, September 25, 2019

Session

Sponsored

Shannon Kalisky (Esri), Alberto Nieto (Esri)

Digital location data is a crucial part of data science. The "where" matters as much to an analysis as the "what" and the "why." Shannon Kalisky and Alberto Nieto explore tools that help you apply a range of geospatial techniques in your data science workflows to get deeper insights. Read more.

How machine learning meets optimization

2:55pm–3:35pm Wednesday, September 25, 2019

Session

Data Science, Machine Learning, & AI

Secondary topics: Financial Services

Jari Koister (FICO )

Machine learning and constraint-based optimization are both used to solve critical business problems. They come from distinct research communities and have traditionally been treated separately. But Jari Koister examines how they're similar, how they're different, and how they can be used to solve complex problems with amazing results. Read more.

Creating a data engineering culture

2:55pm–3:35pm Wednesday, September 25, 2019

Session

Culture and organization

Jesse Anderson (Big Data Institute)

In this talk, we will cover the most common reasons why data engineering teams fail and how to correct them. This will include ways to get your management to understand that data engineering is really complex and time consuming. It is not data warehousing with new names. Management needs to understand that you can’t compare a data engineering team to the web development team, for example. Read more.

Turning big data into knowledge: Managing metadata and data relationships at Uber's scale

2:55pm–3:35pm Wednesday, September 25, 2019

Session

Data Engineering and Architecture

Secondary topics: Data quality, data governance and data lineage, Transportation and Logistics

Kaan Onuk (Uber), Luyao Li (Uber), Atul Gupte (Uber)

Uber takes data driven to the next level. It needs a robust system for discovering and managing various entities, from datasets to services to pipelines, and their relevant metadata isn't just nice—it's absolutely integral to making data useful. Kaan Onuk, Luyao Li, and Atul Gupte explore the current state of metadata management, end-to-end data flow solutions at Uber, and what’s coming next. Read more.

When machines fight machines: Cyberbattles and the new frontier of artificial intelligence

2:55pm–3:35pm Wednesday, September 25, 2019

Session

Data Engineering and Architecture, Security and Privacy

Secondary topics: Privacy and Security

Marcus Fowler (Darktrace)

Cybersecurity must find what it doesn’t know to look for. AI technologies led to the emergence of self-learning, self-defending networks that achieve this—detecting and autonomously responding to in-progress attacks in real time. Marcus Fowler examine these cyber-immune systems enable the security team to focus on high-value tasks, counter even machine-speed threats, and work in all environments. Read more.

Enabling 5G use cases through location intelligence

2:55pm–3:35pm Wednesday, September 25, 2019

Session

Case studies, Strata Business Summit

Secondary topics: Streaming and IoT, Telecom, Transportation and Logistics

Tim McKenzie (Pitney Bowes)

Tim McKenzie examines why planning 5G network rollout and associated services requires a good understanding of location-based data. Accurate addressing and linking consumers to property or points of interest allows data enrichment with attributes, demographics and social data. Companies use location to organize and analyze network and customer data to understand where to target new services. Read more.

Are your privacy practices auditor approved?

2:55pm–3:35pm Wednesday, September 25, 2019

Session

Security and Privacy, Strata Business Summit

Secondary topics: Privacy and Security

Mark Hinely (KirkpatrickPrice)

The fear that comes along with new compliance requirements is overwhelming. Organizations don’t know where to start, what to fix, or what an auditor expects to see. Mark Hinely gives you an auditor's perspective on the newest security and privacy regulations, how your business can prepare for compliance, and what the audit looks like to an auditor. Read more.

Toward more fine-grained sentiment and emotion analysis of text

2:55pm–3:35pm Wednesday, September 25, 2019

Session

Data Science, Machine Learning, & AI, Expo Hall

Secondary topics: Text and Language processing and analysis

Gerard de Melo (Rutgers University)

Gerard de Melo takes a deep dive into the kinds of sentiment and emotion consumers associate with a text. With new data-driven approaches, organizations can better pay attention to what's being said about them in different markets. And you can consider fonts and palettes best suited to convey specific emotions, so organizations can make informed choices when presenting information to consumers. Read more.

Introducing a new anomaly detection algorithm (SR-CNN) inspired by computer vision

2:55pm–3:35pm Wednesday, September 25, 2019

Session

Data Science, Machine Learning, & AI

Secondary topics: Deep Learning, Temporal data and time-series analytics

Tony Xing (Microsoft), Congrui Huang (Microsoft), Qiyang Li (Microsoft), Wenyi Yang (Microsoft)

Anomaly detection may sound old fashioned, yet it's super important in many industry applications. Tony Xing, Congrui Huang, Qiyang Li, and Wenyi Yang detail a novel anomaly-detection algorithm based on spectral residual (SR) and convolutional neural network (CNN) and how this method was applied in the monitoring system supporting Microsoft AIOps and business incident prevention. Read more.

Building a machine learning framework to measure TV advertising attribution

2:55pm–3:35pm Wednesday, September 25, 2019

Session

Data Science, Machine Learning, & AI

Secondary topics: Media and Advertising, Retail and e-commerce

Fei Wang (CarGurus)

Fei Wang takes a deep dive into a case study for the CarGurus TV Attribution Model. You'll understand how you can leverage the creation of a causal inference model to calculate cost per acquisition (CPA) of TV spend and measure effectiveness when compared to CPA of digital performance marketing spend. Read more.

Time travel for data pipelines: Solving the mystery of what changed

2:55pm–3:35pm Wednesday, September 25, 2019

Session

Data Engineering and Architecture

Secondary topics: Data Integration and Data Processing, Data quality, data governance and data lineage

Shradha Ambekar (Intuit), Sunil Goplani (Intuit), Sandeep Uttamchandani (Intuit)

A business insight shows a sudden spike. It can take hours, or days, to debug data pipelines to find the root cause. Shradha Ambekar, Sunil Goplani, and Sandeep Uttamchandani outline how Intuit built a self-service tool that automatically discovers data pipeline lineage and tracks every change, helping debug the issues in minutes—establishing trust in data while improving developer productivity. Read more.

Executive Briefing: Understanding the cult of prediction

2:55pm–3:35pm Wednesday, September 25, 2019

Session

Executive Briefing and best practices, Strata Business Summit

Secondary topics: Ethics

Farrah Bostic (The Difference Engine)

We're living in a culture obsessed with predictions. In politics and business, we collect data in service of the obsession. But our need for certainty and control leads some organizations to be duped by unproven technology or pseudoscience—often with unforeseen societal consequences. Farrah Bostic looks at historical—and sometimes funny—examples of sacrificing understanding for "data." Read more.

How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE (BlueData))

2:55pm–3:35pm Wednesday, September 25, 2019

Session

Sponsored

Anant Chintamaneni (HPE (BlueData)), Matt Maccaux (HPE (BlueData))

Anant Chintamaneni and Matt Maccaux explore whether the combination of containers with large-scale distributed data analytics and machine learning applications is like combining oil and water— or like peanut butter and chocolate. Read more.

3:35pm

3:35pm–4:35pm Wednesday, September 25, 2019

Afternoon break sponsored by MemSQL (1h)

4:35pm

DevOps in the cloud: Deploy, monitor, manage and automate (sponsored by Impetus)

4:35pm–5:15pm Wednesday, September 25, 2019

Session

Sponsored

Amit Assudani (Impetus)

Data lakes and analytical processing on the cloud is a reality. This presents new challenges for DevOps, with respect to Governance, Continuous Integration & Deployment, etc. This session will present our views on how to maintain sanity in your development organization while implementing the many dimensions of building an efficient cloud-based data platform and application development environment. Read more.

Clean the swamp: Gain greater visibility, speed, and governance with data ops (sponsored by Hitachi Vantara)

4:35pm–5:15pm Wednesday, September 25, 2019

Session

Sponsored

Chuck Yarbrough (Hitachi Vantara)

According to Gartner, over 80% of data lake projects were deemed inefficient. Data lakes come and go. Swamps happen. Data agility is fleeting. Chuck Yarbrough walks you through how data ops practices and a modern data architecture bring greater visibility and allow faster data access with proper governance. Read more.

Trill: The crown jewel of Microsoft’s streaming pipeline explained

4:35pm–5:15pm Wednesday, September 25, 2019

Session

Data Engineering and Architecture, Streaming and IoT

Secondary topics: Cloud Platforms and SaaS, Data Integration and Data Processing, Media and Advertising, Streaming and IoT

James Terwilliger (Microsoft Corporation), Badrish Chandramouli (Microsoft Research), Jonathan Goldstein (Microsoft Research)

Trill has been open-sourced, making the streaming engine behind services like the Bing Ads platform available for all to use and extend. James Terwilliger, Badrish Chandramouli, and Jonathan Goldstein dive into the history of and insights from streaming data at Microsoft. They demonstrate how its API can power complex application logic and the performance that gives the engine its name. Read more.

Semantics and graph data models in the enterprise data fabric (sponsored by Cambridge Semantics)

4:35pm–5:15pm Wednesday, September 25, 2019

Session

Sponsored

Barbara Petrocelli (Cambridge Semantics), Peter Ball (Consultant)

Join industry consultant Peter Ball, of Liminal Innovation, and Barbara Petrocelli, VP Field Operations of Cambridge Semantics, to learn how enterprise data fabrics are reshaping the modern data management landscape. Read more.

Predicting Criteo’s internet traffic load using Bayesian structural time series models

4:35pm–5:15pm Wednesday, September 25, 2019

Session

Data Science, Machine Learning, & AI

Secondary topics: Media and Advertising, Temporal data and time-series analytics

Hamlet Jesse Medina Ruiz (Criteo)

Criteo’s infrastructure provides the capacity and connectivity to host Criteo’s platform and applications. The evolution of this infrastructure is driven by the ability to forecast Criteo’s traffic demand. Hamlet Jesse Medina Ruiz explains how Criteo uses Bayesian dynamic time series models to accurately forecast its traffic load and optimize hardware resources across data centers. Read more.

Downscaling: The Achilles heel of autoscaling Spark clusters

4:35pm–5:15pm Wednesday, September 25, 2019

Session

Data Engineering and Architecture

Secondary topics: Deep dive into specific tools, platforms, or frameworks

Prakhar Jain (Microsoft), Sourabh Goyal (Qubole)

Autoscaling of resources aims to achieve low latency for a big data application while reducing resource costs. Upscaling a cluster in cloud is fairly easy as compared to downscaling nodes, and so the overall total cost of ownership (TCO) goes up. Prakhar Jain and Sourabh Goyal examine a new design to get efficient downscaling, which helps achieve better resource utilization and lower TCO. Read more.

The case for a common metadata layer for machine learning platforms

4:35pm–5:15pm Wednesday, September 25, 2019

Session

Data Engineering and Architecture

Secondary topics: Data quality, data governance and data lineage

Max Neunhöffer (ArangoDB), Joerg Schad (ArangoDB)

Machine learning platforms are becoming more complex, with different components each producing their own metadata and their own way of storing metadata. Max Neunhöffer and Joerg Schad propose a first draft of a common metadata API and demonstrate a first implementation of this API in Kubeflow using ArangoDB, a native multimodel database. Read more.

Protecting the healthcare enterprise from PHI breaches using streaming and NLP

4:35pm–5:15pm Wednesday, September 25, 2019

Session

Data Engineering and Architecture

Secondary topics: Health and Medicine, Privacy and Security

Jeff Zemerick (Mountain Fog)

Hospitals small and large are adopting cloud technologies, and many are in hybrid environments. These distributed environments pose challenges, none of which are more critical than the protection of protected health information (PHI). Jeff Zemerick explores how open source technologies can be used to identify and remove PHI from streaming text in an enterprise healthcare environment. Read more.

What does the public say? A computational analysis of regulatory comments

4:35pm–5:15pm Wednesday, September 25, 2019

Session

Case studies, Strata Business Summit

Secondary topics: Text and Language processing and analysis

Vlad Eidelman (FiscalNote)

While regulations affect your life every day, and millions of public comments are submitted to regulatory agencies in response to their proposals, analyzing the comments has traditionally been reserved for legal experts. Vlad Eidelman outlines how natural language processing (NLP) and machine learning can be used to automate the process by analyzing over 10 million publicly released comments. Read more.

Supercharging Elasticsearch for extended Knowledge Graph use cases

4:35pm–5:15pm Wednesday, September 25, 2019

Session

Business Analytics and Visualization, Strata Business Summit

Secondary topics: BI, Interactive Analytics and Visualization, Data Management and Storage, Deep dive into specific tools, platforms, or frameworks

Giovanni Tummarello (Siren)

Elasticsearch (ES) allows extremely quick search and drilldowns on large amounts of semistructured data. Elasticsearch, however, does not have relational join capabilities. Giovanni Tummarello examines a plug-in for ES that adds cluster distributed joins and demonstrates how it enables an exciting array of use cases dealing with interconnected or "Knowledge Graph" enterprise data. Read more.

Search logs + machine learning = autotagged inventory

4:35pm–5:15pm Wednesday, September 25, 2019

Session

Data Science, Machine Learning, & AI, Expo Hall

Secondary topics: Text and Language processing and analysis

John Berryman (Eventbrite)

Eventbrite is exploring a new machine learning approach that allows it to harvest data from customer search logs and automatically tag events based upon their content. John Berryman dives into the results and how they have allowed the company to provide users with a better inventory-browsing experience. Read more.

Deep learning on mobile

4:35pm–5:15pm Wednesday, September 25, 2019

Session

Data Science, Machine Learning, & AI

Secondary topics: Data Integration and Data Processing, Deep Learning, Financial Services

Anirudh Koul (Microsoft), Meher Kasam (Square)

Over the last few years, convolutional neural networks (CNNs) have risen in popularity, especially in the area of computer vision. Anirudh Koul and Meher Kasam take you through how you can get deep neural nets to run efficiently on mobile devices. Read more.

From whiteboard to production: A demand forecasting system for an online grocery shop

4:35pm–5:15pm Wednesday, September 25, 2019

Session

Data Science, Machine Learning, & AI

Secondary topics: Retail and e-commerce, Temporal data and time-series analytics

Robert Pesch (inovex), Robin Senge (inovex)

Data-driven software is revolutionizing the world and enable intelligent services we interact with daily. Robert Pesch and Robin Senge outline the development process, statistical modeling, data-driven decision making, and components needed for productionizing a fully automated and highly scalable demand forecasting system for an online grocery shop for a billion-dollar retail group in Europe. Read more.

Apache Hadoop 3.x state of the union and upgrade guidance

4:35pm–5:15pm Wednesday, September 25, 2019

Session

Data Engineering and Architecture

Secondary topics: Deep dive into specific tools, platforms, or frameworks

Wangda Tan (Cloudera), Wei-Chiu Chuang (Cloudera)

Wangda Tan and Wei-Chiu Chuang outline the current status of Apache Hadoop community and dive into present and future of Hadoop 3.x. You'll get a peak at new features like erasure coding, GPU support, NameNode federation, Docker, long-running services support, powerful container placement constraints, data node disk balancing, etc. And they walk you through upgrade guidance from 2.x to 3.x. Read more.

Executive Briefing: Data catalogs—Concepts, capabilities, and key platforms

4:35pm–5:15pm Wednesday, September 25, 2019

Session

Executive Briefing and best practices, Strata Business Summit

Secondary topics: Data quality, data governance and data lineage

Andrew Brust (Blue Badge Insights | ZDNet)

Andrew Brust provides a primer on data catalogs and a review of the major vendors and platforms in the market. He examines the use of data catalogs with classic and newer data repositories, including data warehouses, data lakes, cloud object storage, and even software and applications. You'll learn about AI's role in the data catalog world and get an analysis of data catalog futures. Read more.

Solve tomorrow’s business challenges with a modern data warehouse (sponsored by Matillion)

4:35pm–5:15pm Wednesday, September 25, 2019

Session

Sponsored

Daniel D'Orazio (Matillion)

According to Forrester, insight-driven companies are on pace to make $1.8 trillion annually by 2021. Daniel D'Orazio wants to know how fast your team can collect, process, and analyze data to solve present—and future—business challenges. You'll gain actionable tips and lessons learned from cloud data warehouse modernizations at companies like DocuSign that you can take back to your business. Read more.

5:25pm

Harnessing graph-native algorithms to enhance machine learning: A primer

5:25pm–6:05pm Wednesday, September 25, 2019

Session

Data Science, Machine Learning, & AI

Secondary topics: Transportation and Logistics

Brandy Freitas (Pitney Bowes)

Brandy Freitas examines the interplay between graph analytics and machine learning, improved feature engineering with graph native algorithms, and how to harness the power of graph structure for machine learning through node embedding. Read more.

The why and how of data lineage

5:25pm–6:05pm Wednesday, September 25, 2019

Session

Data Engineering and Architecture

Secondary topics: Data quality, data governance and data lineage, Retail and e-commerce

Neelesh Salian (Stitch Fix)

Every data team has to build an ecosystem that sustains the data, the users, and the use of the data itself. This data ecosystem comes with its own challenges during the building phase, maintenance, and enhancement. Neelesh Salian dives into the importance of data lineage for an organization. You'll explore how to go about building such a system. Read more.

The future of Hadoop in an era of exponentially growing data (sponsored by SQream)

5:25pm–6:05pm Wednesday, September 25, 2019

Session

Sponsored

David Leichner (SQream)

What started as an asset for data scientists and BI professionals has become a poorly performing problem. David Leichner explores the Hadoop ecosystem and relational databases from an analytics perspective—reviewing the current landscape, what Hadoop was designed for, and how a Hadoop-based infrastructure can be improved to support a new era of exponentially growing data. Read more.

Fast data with the KISSS stack

5:25pm–6:05pm Wednesday, September 25, 2019

Session

Data Engineering and Architecture

Secondary topics: Data, Analytics, and AI Architecture, Streaming and IoT

Bas Geerdink (Aizonic)

Streaming analytics (or fast data processing) is the field of making predictions based on real-time data. Bas Geerdink presents a fast data architecture that covers many use cases that follow a "pipes and filters" pattern. This architecture can be used to create enterprise-grade solutions with a diversity of technology options. The stack is Kafka, Ignite, and Spark Structured Streaming (KISSS). Read more.

Challenges faced in machine learning infrastructure in traditional large enterprises

5:25pm–6:05pm Wednesday, September 25, 2019

Session

Automation in data science and data, Data Engineering and Architecture, Data Science, Machine Learning, & AI

Secondary topics: Data, Analytics, and AI Architecture, Media and Advertising, Model Development, Governance, Operations

venkata gunnu (Comcast), Harish Doddi (Datatron)

Machine learning infrastructure is key to the success of AI at scale in enterprises, with many challenges when you want to bring machine learning models to a production environment, given the legacy of the enterprise environment. Venkata Gunnu and Harish Doddi explore some key insights, what worked, what didn't work, and best practices that helped the data engineering and data science teams. Read more.

Causal inference 101: Answering the crucial "why" in your analysis

5:25pm–6:05pm Wednesday, September 25, 2019

Session

Data Science, Machine Learning, & AI

Secondary topics: Retail and e-commerce

Subhasish Misra (Walmart Labs)

Causal questions are ubiquitous, and randomized tests are considered the gold standard. However, such tests are not always feasible, and then you just have observational data to get to causal insights. But techniques such as matching offer an opportunity to solve this. Subhasish Misra explores this and practical tips when trying to infer causal effects. Read more.

Improving Spark by taking advantage of disaggregated architecture

5:25pm–6:05pm Wednesday, September 25, 2019

Session

Data Engineering and Architecture

Secondary topics: Data, Analytics, and AI Architecture, Deep dive into specific tools, platforms, or frameworks

Chenzhao Guo (Intel), Carson Wang (Intel)

Shuffle in Spark requires the shuffle data to be persisted on local disks. However, the assumptions of collocated storage do not always hold in today’s data centers. Chenzhao Guo and Carson Wang outline the implementation of a new Spark shuffle manager, which writes shuffle data to a remote cluster with different storage backends, making life easier for customers. Read more.

Finding your needle in a haystack

5:25pm–6:05pm Wednesday, September 25, 2019

Session

Data Engineering and Architecture

Secondary topics: Data quality, data governance and data lineage, Data, Analytics, and AI Architecture

Naghman Waheed (Bayer Crop Science), John Cooper (Bayer)

As complexity of data systems has grown at Bayer, so has the difficulty to locate and understand what datasets are available for consumption. Naghman Waheed and John Cooper outline a custom metadata management tool recently deployed at Bayer. The system is cloud-enabled and uses multiple open source components, including machine learning and natural language processing to aid searches. Read more.

Secured computation: Analyzing sensitive data using homomorphic encryption

5:25pm–6:05pm Wednesday, September 25, 2019

Session

Data Engineering and Architecture, Security and Privacy

Secondary topics: Media and Advertising, Privacy and Security

Matt Carothers (Cox Communications), Jignesh Patel (Cox Communications), Harry Tang (Cox Communications)

Organizations often work with sensitive information such as social security and credit card numbers. Although this data is stored in encrypted form, most analytical operations require data decryption for computation. This creates unwanted exposures to theft or unauthorized read by undesirables. Matt Carothers, Jignesh Patel, and Harry Tang explain how homomorphic encryption prevents fraud. Read more.

How Brazil deployed a 160 million-person biometric identification system: Challenges, benefits, and lessons learned

5:25pm–6:05pm Wednesday, September 25, 2019

Session

Case studies, Strata Business Summit

Secondary topics: Data, Analytics, and AI Architecture, Health and Medicine

Thiago Ribeiro (Griaule)

Brazil deployed a national biometric system to register all Brazilian voters using multiple biometric modalities and to ensure that a person does not enroll twice. This session highlights how a large-scale biometric system works, and what are the main architecture decisions that one has to take in consideration. Read more.

Looking beyond the binary: How data for development impacts gender justice?

5:25pm–6:05pm Wednesday, September 25, 2019

Session

Case studies, Strata Business Summit

Secondary topics: Data quality, data governance and data lineage, Ethics, Health and Medicine

Brindaalakshmi K (Independent Consultant)

There's a lack of standard for the collection of gender data. Brindaalakshmi K takes a look at the implications of such a lack in the context of a developing country like India, the exclusion of individuals beyond the binary genders of male and female, and how this exclusion permeates beyond the public sector into private sector services. Read more.

Alexa, do men talk too much?

5:25pm–6:05pm Wednesday, September 25, 2019

Session

Data Science, Machine Learning, & AI, Expo Hall

Secondary topics: Culture and Organization, Text and Language processing and analysis

Sireesha Muppala (Amazon Web Services), Shelbee Eigenbrode (Amazon Web Services), Emily Webber (Amazon Web Services)

Mansplaining. Know it? Hate it? Want to make it go away? Sireesha Muppala, Shelbee Eigenbrode, and Emily Webber tackle the problem of men talking over or down to women and its impact on career progression for women. They also demonstrate an Alexa skill that uses deep learning techniques on incoming audio feeds, examine ownership of the problem for women and men, and suggest helpful strategies. Read more.

Deploying end-to-end deep learning pipelines with ONNX

5:25pm–6:05pm Wednesday, September 25, 2019

Session

Data Science, Machine Learning, & AI

Secondary topics: Deep Learning, Model Development, Governance, Operations

Nick Pentreath (IBM)

The common perception of deep learning is that it results in a fully self-contained model. However, in most cases, these models have similar requirements for data preprocessing as does more "traditional" machine learning. Despite this, there are few standard solutions for deploying end-to-end deep learning. Nick Pentreath explores how the ONNX format and ecosystem addresses this challenge. Read more.

Data science and the business of Major League Baseball

5:25pm–6:05pm Wednesday, September 25, 2019

Session

Data Science, Machine Learning, & AI

Secondary topics: Media and Advertising

Aaron Owen (Major League Baseball), Matthew Horton (Major League Baseball), Josh Hamilton (Major League Baseball)

Using SAS, Python, and AWS SageMaker, Major League Baseball's (MLB's) data science team outlines how it predicts ticket purchasers’ likelihood to purchase again, evaluates prospective season schedules, estimates customer lifetime value, optimizes promotion schedules, quantifies the strength of fan avidity, and monitors the health of monthly subscriptions to its game-streaming service. Read more.

HBase 2.0 and beyond

5:25pm–6:05pm Wednesday, September 25, 2019

Session

Data Engineering and Architecture, Streaming and IoT

Secondary topics: Deep dive into specific tools, platforms, or frameworks

Krishna Maheshwari (Cloudera)

Krishna Maheshwari provides an overview of the major features and enhancements in the HBase 2.0 release, upcoming releases, and the future of HBase. You'll be able to ask her questions at the end. Apache HBase 2.0 comes packed with a lot of new functionalities: off-heap read paths, multitier bucket cache, new finite state machine-based assignment manager, etc. Read more.

Executive Briefing: Making intelligent insights at the edge—The demise of big data?

5:25pm–6:05pm Wednesday, September 25, 2019

Session

Executive Briefing and best practices, Strata Business Summit

Secondary topics: Data, Analytics, and AI Architecture, Privacy and Security

Alasdair Allan (Babilim Light Industries)

The arrival of a new generation of smart embedded hardware may cause the demise of large-scale data harvesting. In its place, smart devices will let us process data at the edge and extract insights without storing potentially privacy and GDPR infringing data. Join Alasdair Allan to learn why the current age where privacy is no longer "a social norm" may not long survive the coming of the IoT. Read more.

How Nuveen rapidly integrated ESG data to advance its platform value (sponsored by Zaloni)

5:25pm–6:05pm Wednesday, September 25, 2019

Session

Sponsored

Ben Sharma (Zaloni), Santanu Sengupta (Nuveen)

Ben Sharma and Santanu Sengupta walk you through how to quickly integrate and accelerate environmental, social, and governance (ESG) data and third-party data into your environment to provide governed, trusted, and traceable data to portfolio managers and analysts in a self-service manner. Read more.

6:05pm

Booth Crawl

6:05pm–7:05pm Wednesday, September 25, 2019

Event

Make your way from booth to booth while you check out all the exhibitors in the Expo Hall on Wednesday after sessions end. Read more.

7:30pm

Data After Dark

7:30pm–10:30pm Wednesday, September 25, 2019

Event

Don't miss an exciting evening filled with cocktails, food, and entertainment at Data After Dark at Strata in New York. Read more.

Thursday, 09/26/2019

8:00am

Speed Networking

8:00am–8:30am Thursday, September 26, 2019

Event

Gather before keynotes on Thursday morning to enjoy casual conversation while meeting fellow attendees. If one of your goals at Strata is to meet new people, this session will jumpstart your networking with other attendees. Read more.

8:30am

8:30am–8:45am Thursday, September 26, 2019

Early morning coffee (8:00am - 8:45am) (15m)

8:45am

Thursday keynotes

8:45am–8:55am Thursday, September 26, 2019

Keynote

Ben Lorica (O'Reilly), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)

Program chairs Ben Lorica, Doug Cutting, and Alistair Croll welcome you to the second day of keynotes. Read more.

8:55am

Staying safe in the AI era

8:55am–9:15am Thursday, September 26, 2019

Keynote

Cassie Kozyrkov (Google)

Machine learning and artificial intelligence are no longer science fiction, so now you have to address what it takes to harness their potential effectively, responsibly, and reliably. Based on lessons learned at Google, Cassie Kozyrkov offers actionable advice to help you find opportunities to take advantage of machine learning, navigate the AI era, and stay safe as you innovate. Read more.

9:15am

Unlocking the value of your data (sponsored by IBM)

9:15am–9:25am Thursday, September 26, 2019

Keynote

DANIEL HERNANDEZ (IBM)

Daniel Hernandez takes a deep dive into how, with a unified, prescriptive information architecture, organizations can successfully unlock the value of their data for an AI and multicloud world. Read more.

9:25am

Delivering the enterprise data cloud

9:25am–9:35am Thursday, September 26, 2019

Keynote

Arun Murthy (Cloudera )

In this keynote, we’ll introduce you to the new 100% open source Cloudera Data Platform (CDP), the world’s first enterprise data cloud. CDP is hybrid and multi-cloud, delivering the speed, agility, and scale you need to secure and govern your data anywhere from the edge to AI. Read more.

9:35am

Postrevolutionary big data: Promoting the general welfare (sponsored by Io-Tahoe)

9:35am–9:40am Thursday, September 26, 2019

Keynote

Barbara Eckman (Comcast)

Barbara Eckman shares lessons learned from early big data mistakes and the progress her team at Comcast is making toward a postrevolutionary big data vision. Read more.

9:40am

RL in real life: Bringing reinforcement learning to the enterprise (sponsored by Microsoft Azure)

9:40am–9:45am Thursday, September 26, 2019

Keynote

Edward Jezierski (Microsoft)

Microsoft has an ecosystem spanning research, gaming, and the cloud that's advancing reinforcement learning (RL) and putting it into everyday use. Join Edward Jezierski to see where RL is used practically across Microsoft and imagine the opportunities that exist for your business today. Read more.

9:45am

Strata Data Awards: Winners announced

9:45am–9:55am Thursday, September 26, 2019

Keynote

The Strata Data Awards recognize the most innovative startups, leaders, and data science projects from Strata sponsors and exhibitors around the world. Join us during keynotes for the announcement of the winners. Read more.

9:55am

Say what? The ethical challenges of designing for humanlike interaction

9:55am–10:15am Thursday, September 26, 2019

Keynote

Jonathan Foster (Microsoft)

Language shapes our thinking, our relationships, our sense of self. Conversation connects us in powerful, intimate, and often unconscious ways. Jonathan Foster explains why, as we design for natural language interactions and more humanlike digital experiences, language—as design material, conversation, and design canvas—reveals ethical challenges we couldn't encounter with GUI-powered experiences. Read more.

10:15am

Data Science Pioneers: Conquering the next frontier, a documentary investigating the future of data science (sponsored by Dataiku)

10:15am–10:20am Thursday, September 26, 2019

Keynote

Jed Dougherty (Dataiku)

Jed Dougherty presents the trailer of the upcoming _Data Science Pioneers_ documentary about the passionate data scientists driving us toward technological revolution. Cut through the hype with _Data Science Pioneers_ and see what it really means to be a data scientist. Read more.

10:20am

Data sonification: Making music from the yield curve

10:20am–10:40am Thursday, September 26, 2019

Keynote

Alan Smith (Financial Times)

Based on a critical evaluation of the iconic yield curve chart, Alan Smith argues that combining visualization (data to pixels) with sonification (data to pitch) offers potential to improve not only aesthetic multimedia experiences but also an opportunity to take the presentation of data into the rapidly expanding universe of screenless devices and products. Read more.

10:40am

Closing remarks

10:40am–10:45am Thursday, September 26, 2019

Keynote

Ben Lorica (O'Reilly), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)

Program chairs, Ben Lorica, Doug Cutting, and Alistair Croll, offer closing remarks. Read more.

10:50am

10:50am–11:20am Thursday, September 26, 2019

Morning break sponsored by Cisco (30m)

11:20am

The key to climbing the AI ladder (sponsored by IBM)

11:20am–12:00pm Thursday, September 26, 2019

Session

Sponsored

DANIEL HERNANDEZ (IBM)

AI isn't magic. It’s still hard work. Daniel Hernandez explains why having the technology alone isn't enough; it requires a thoughtful and well-architected approach. Read more.

Deliver personalized experiences and content like Xbox with Cognitive Services Personalizer (sponsored by Microsoft Azure)

11:20am–12:00pm Thursday, September 26, 2019

Session

Sponsored

Edward Jezierski (Microsoft), Jackie Nichols (Microsoft)

Edward Jezierski and Jackie Nichols demonstrate how Cognitive Services Personalizer works with your content and data, how it autonomously learns to make optimal decisions, how you can add it to your app with two lines of code, and what’s under the hood. Then they share the results Personalizer achieved on the Xbox One home page as well as best practices for applying it in your applications today. Read more.

Organizing the chaos of healthcare with smart data discovery (sponsored by Io-Tahoe)

11:20am–12:00pm Thursday, September 26, 2019

Session

Sponsored

Charles Boicey (Clearsense)

Healthcare’s reliance on comprehendible data is critical to the mission of providing optimal and affordable care. Charles Boicey takes a deep dive into how the application of technology, such as machine learning, is paramount to the modernization of healthcare that provides its professionals with fully integrated and complete medical records. Read more.

Your cloud, your ML, but more and more scale? How SurveyMonkey did it

11:20am–12:00pm Thursday, September 26, 2019

Session

Data Engineering and Architecture

Secondary topics: Cloud Platforms and SaaS, Data, Analytics, and AI Architecture, Media and Advertising

Jing Huang (SurveyMonkey), Jesscia Mong (SurveyMonkey)

You're a SaaS company operating on a cloud infrastructure prior to the machine learning (ML) era and you need to successfully extend your existing infrastructure to leverage the power of ML. Jing Huang and Jessica Mong detail a case study with critical lessons from SurveyMonkey’s journey of expanding its ML capabilities with its rich data repo and hybrid cloud infrastructure. Read more.

Transforming Financial Reporting Services with Massively Scalable OLAP (sponsored by Kyvos Insights)

11:20am–12:00pm Thursday, September 26, 2019

Session

Sponsored

Ajay Anand (Kyvos Insights)

Learn how you can overcome the challenges of traditional OLAP solutions and scale BI to deliver quick insights to business users across your enterprise Read more.

A practical guide to algorithmic bias and explainability in machine learning

11:20am–12:00pm Thursday, September 26, 2019

Session

Data Science, Machine Learning, & AI

Secondary topics: Ethics

Alejandro Saucedo (The Institute for Ethical AI & Machine Learning)

Alejandro Saucedo demystifies AI explainability through a hands-on case study, where the objective is to automate a loan-approval process by building and evaluating a deep learning model. He introduces motivations through the practical risks that arise with undesired bias and black box models and shows you how to tackle these challenges using tools from the latest research and domain knowledge. Read more.

Online machine learning in streaming applications

11:20am–12:00pm Thursday, September 26, 2019

Session

Data Engineering and Architecture, Streaming and IoT

Secondary topics: Streaming and IoT, Temporal data and time-series analytics

Stavros Kontopoulos (Lightbend), Debasish Ghosh (Lightbend)

Stavros Kontopoulos and Debasish Ghosh explore online machine learning algorithm choices for streaming applications, especially those with resource-constrained use cases like IoT and personalization. They dive into Hoeffding Adaptive Trees, classic sketch data structures, and drift detection algorithms from implementation to production deployment, describing the pros and cons of each of them. Read more.

Performant time series data management and analytics with PostgreSQL

11:20am–12:00pm Thursday, September 26, 2019

Session

Data Engineering and Architecture

Secondary topics: Data Management and Storage, Streaming and IoT

Michael Freedman (TimescaleDB | Princeton University)

Leveraging polyglot solutions for your time series data can lead to issues including engineering complexity, operational challenges, and even referential integrity concerns. Michael Freedman explains why, by re-engineering PostgreSQL to serve as a general data platform, your high-volume time series workloads will be better streamlined, resulting in more actionable data and greater ease of use. Read more.

Where's my lookup table? Modeling relational data in a denormalized world

11:20am–12:00pm Thursday, September 26, 2019

Session

Data Engineering and Architecture

Secondary topics: Cloud Platforms and SaaS, Data Management and Storage

Rick Houlihan (Amazon Web Services)

Data has always been and will always be relational. NoSQL databases are gaining in popularity, but that doesn't change the fact that the data is still relational, it just changes how we have to model the data. Rick Houlihan dives deep into how real entity relationship models can be efficiently modeled in a denormalized manner using schema examples from real application services. Read more.

Executive Briefing: Say what? The ethical challenges of designing for humanlike interaction

11:20am–12:00pm Thursday, September 26, 2019

Session

Strata Business Summit

Jonathan Foster (Microsoft)

How Deutsche Bank industrialized AI and machine learning

11:20am–12:00pm Thursday, September 26, 2019

Session

Data Science, Machine Learning, & AI, Strata Business Summit

John Allen (Deutsche Bank)

As an early adopter of data science, machine learning, and AI, Deutsche Bank's analytics function is trailblazing new ways to drive revenues, lower costs, and reduce risk across all areas of the group. John Allen shares how his team combines commercial offerings with open source technologies to revolutionize legacy processes and transform the way the bank uses technology to drive innovation. Read more.

ML is not enough: Decision automation in the real world

11:20am–12:00pm Thursday, September 26, 2019

Session

Data Science, Machine Learning, & AI, Expo Hall

Secondary topics: Culture and Organization, Retail and e-commerce, Transportation and Logistics

Brian Keng (Rubikloud)

Automating decisions require a system to consider more than just a data-driven prediction. Real-world decisions require additional constraints and fuzzy objectives to ensure they're robust and consistent with business goals. Brian Keng takes a deep dive into how to leverage modern machine learning methods and traditional mathematical optimization techniques for decision automation. Read more.

Getting to know the elephant: Real-time debugging and visualization for deep learning

11:20am–12:00pm Thursday, September 26, 2019

Session

Data Science, Machine Learning, & AI

Shital Shah (Microsoft Research)

Taming massive deep learning models, data, and training times requires new way of thinking. Shital Shah explores new tools and methods to better understand AI. Explaining the decisions made by AI not only helps us accelerate its development but also make it safe and more trustworthy. Read more.

Working with time series: Denoising and imputation frameworks to improve data density

11:20am–12:00pm Thursday, September 26, 2019

Session

Data Science, Machine Learning, & AI

Secondary topics: Financial Services, Temporal data and time-series analytics

Anjali Samani (CircleUp)

The application of smoothing and imputation strategies is common practice in predictive modeling and time series analysis. With a technique-agnostic approach, Anjali Samani provides qualitative and quantitative frameworks that address questions related to smoothing and imputation of missing values to improve data density. Read more.

Using Spark for crunching astronomical data on the LSST scale

11:20am–12:00pm Thursday, September 26, 2019

Session

Data Engineering and Architecture

Secondary topics: Data Integration and Data Processing

Petar Zecevic (SV Group)

The Large Scale Survey Telescope (LSST) is one of the most important future surveys. Its unique design allows it to cover large regions of the sky and obtain images of the faintest objects. After 10 years of operation, it will produce about 80 PB of data in images and catalog data. Petar Zecevic explains AXS, a system built for fast processing and cross-matching of survey catalog data. Read more.

Executive Briefing: Creating a center for data science from scratch—Lessons from nonprofit research

11:20am–12:00pm Thursday, September 26, 2019

Session

Culture and organization, Strata Business Summit

Secondary topics: Culture and Organization

Gayle Bieler (RTI International)

Gayle Bieler explains how she built a thriving center for data science within a large, well-respected nonprofit research institute and shares some of its most impactful projects and best adventures to date, that have solved important national problems, improved local communities, and transformed research. Read more.

12:00pm

12:00pm–1:15pm Thursday, September 26, 2019

Break (1h 15m)

Thursday Business Summit Lunch

12:00pm–1:15pm Thursday, September 26, 2019

Event

Join Strata Business Summit speakers and attendees for a networking lunch on Thursday. Read more.

Thursday Topic Tables at Lunch (sponsored by IBM)

12:00pm–1:15pm Thursday, September 26, 2019

Event

Topic Table discussions are a great way to informally network with people in similar industries or interested in the same topics. Read more.

12:30pm

Why AI fails: Overcoming AI challenges (sponsored by IBM)

12:30pm–1:10pm Thursday, September 26, 2019

Session

Brittany Bogle (IBM)

AI will be the most disruptive class of technologies over the next decade, fueled by near-endless amounts of data and unprecedented advances in deep learning. Brittany Bogle walks you through how to address some of the major AI challenges, like trust, talent, and data. Read more.

1:15pm

So you built a model; now what? (sponsored by Dataiku)

1:15pm–1:55pm Thursday, September 26, 2019

Session

Sponsored

Jed Dougherty (Dataiku)

Jed Dougherty takes a deep dive into an often overlooked aspect of the data science lifecycle: model deployment. Once they’ve constructed a data science model that does a good job accurately predicting their test set, many data scientists think the job is over. But really, it’s just begun. Read more.

Migrating Hadoop analytics to Spark in the cloud without disruption (sponsored by WANdisco)

1:15pm–1:55pm Thursday, September 26, 2019

Session

Sponsored

Paul Scott-Murphy (WANdisco)

Paul Scott-Murphy dives into the options that exist for cloud migration and their advantages and disadvantages, what cloud vendors do and don't offer to support large-scale migration, the business risks associated with large-scale cloud migration, and how to migrate analytics data at scale for immediate use in Spark without disrupting on-premises operations. Read more.

Next-generation serverless data architecture for insights at the speed of thought (sponsored by Actian)

1:15pm–1:55pm Thursday, September 26, 2019

Session

Sponsored

Paul Wolmering (Actian)

Paul Wolmering explores the key characteristics for building an Agile data warehouse and defines a reference architecture for hybrid data. Read more.

Managing your Kafka in an explosive growth environment

1:15pm–1:55pm Thursday, September 26, 2019

Session

Data Engineering and Architecture

Secondary topics: Data Management and Storage, Deep dive into specific tools, platforms, or frameworks

Alon Gavra (AppsFlyer)

Frequently, Kafka is just a piece of the stack that lives in production that often times no one wants to touch—because it just works. Alon Gavra outlines how Kafka sits at the core of AppsFlyer's infrastructure that processes billions of events daily. Read more.

The end of applications: How data collaboration is changing everything (sponsored by Cinchy)

1:15pm–1:55pm Thursday, September 26, 2019

Session

Sponsored

Dan DeMers (Cinchy)

After 40 years of apps, enterprise companies now realize that building or buying an application for every use case has become a major threat to their ability to leverage and protect their core data assets. Dan DeMers provides a live demo of Cinchy, the world’s first data collaboration platform. Read more.

Data need not be a moat: Mixed formal learning enables zero- and low-shot learning

1:15pm–1:55pm Thursday, September 26, 2019

Session

Data Science, Machine Learning, & AI

Secondary topics: Text and Language processing and analysis

Sandra Carrico (GLYNT)

Sandra Carrico explores mixed formal learning, explains it, and outlines one machine learning example that previously used large numbers of examples and now learns with either zero or a handful of training examples. It maps apparently idiosyncratic techniques to mixed formal learning, a general AI architecture that you can use in your projects. Read more.

Problems taking AI to production and how to fix them

1:15pm–1:55pm Thursday, September 26, 2019

Session

Data Engineering and Architecture

Secondary topics: Model Development, Governance, Operations

Jim Scott (NVIDIA)

Data scientists create and test hundreds or thousands more models than in the past. Models require support from both real-time and static data sources. As data becomes enriched, and parameters tuned and explored, there's a need for versioning everything, including the data. Jim Scott examines the very specific problems and approaches to fix them. Read more.

How to performance-tune Spark applications in large clusters

1:15pm–1:55pm Thursday, September 26, 2019

Session

Data Engineering and Architecture

Secondary topics: Deep dive into specific tools, platforms, or frameworks, Transportation and Logistics

Omkar Joshi (Uber), Bo Yang (Uber)

Omkar Joshi and Bo Yang offer an overview of how Uber’s ingestion (Marmary) and observability team improved performance of Apache Spark applications running on thousands of cluster machines and across hundreds of thousands+ of applications and how the team methodically tackled these issues. They also cover how they used Uber’s open-sourced jvm-profiler for debugging issues at scale. Read more.

Intelligent design patterns for cloud-based analytics and BI

1:15pm–1:55pm Thursday, September 26, 2019

Session

Business Analytics and Visualization, Data Engineering and Architecture

Secondary topics: BI, Interactive Analytics and Visualization

Shant Hovsepian (Arcadia Data)

With cloud object storage (e.g., S3, ADLS) one expects business intelligence (BI) applications to benefit from the scale of data and real-time analytics. However, traditional BI in the cloud surfaces nonobvious challenges. Shant Hovsepian examines service-oriented cloud design (storage, compute, catalog, security, SQL) and how native cloud BI provides analytic depth, low cost, and performance. Read more.

An in-depth look at the data science career: Defining roles, assessing skills

1:15pm–1:55pm Thursday, September 26, 2019

Session

Culture and organization, Strata Business Summit

Secondary topics: Culture and Organization

Usama Fayyad (Open Insights & OODA Health, Inc.), Hamit Hamutcu (Analytics Center)

If you've ever been confused about what it takes to be a data scientist or curious about how companies recruit, train, and manage analytics resources, Usama Fayyad and Hamit Hamutcu are here to explore insights from the most comprehensive research effort to date on the data analytics profession and propose a framework for the standardization of roles and methods for assessing skills. Read more.

Communication breakdown: Facing machine learning’s all-too-human failure

1:15pm–1:55pm Thursday, September 26, 2019

Session

Executive Briefing and best practices, Strata Business Summit

James Kotecki (Infinia ML)

Miscommunication between business leaders and technical experts can doom even the best data science project. Don’t let it drive you insane! In this session, we’ll dissect many flavors of communication failure, from goal misalignment to technical misunderstanding. Then, we’ll explore practical ways to bridge these gaps. Read more.

Handtrack.js: Building gesture-based interactions in the browser using TensorFlow

1:15pm–1:55pm Thursday, September 26, 2019

Session

Data Science, Machine Learning, & AI, Expo Hall

Secondary topics: Deep dive into specific tools, platforms, or frameworks, Deep Learning

Victor Dibia (Cloudera Fast Forward Labs)

Recent advances in machine learning frameworks for the browser such as TensorFlow provides the opportunity to craft truly novel experiences within frontend applications. Victor Dibia explores the state of the art for machine learning in the browser using TensorFlow and outlines its use in the design of Handtrack.js—a library for prototyping real-time hand detection in the browser. Read more.

Scaling Apache Spark at Facebook

1:15pm–1:55pm Thursday, September 26, 2019

Session

Data Science, Machine Learning, & AI

Sameer Agarwal (Facebook), Ankit Agarwal (Facebook Inc.)

Apache Spark is the largest compute engine at Facebook by CPU. Sameer Agarwal dives into the story of how Facebook optimized, tuned, and scaled Apache Spark to run on clusters of tens of thousands of machines, processing hundreds of petabytes of data, and being used by thousands of data scientists, engineers, and product analysts every day. Read more.

Handling data gaps in time series using imputation

1:15pm–1:55pm Thursday, September 26, 2019

Session

Data Science, Machine Learning, & AI

Secondary topics: Temporal data and time-series analytics

Alfred Whitehead (Klick), clare jeon (Klick)

Time series forecasts depend on sensors or measurements made in the real, messy world. The sensors flake out, get turned off, disconnect, and otherwise conspire to cause missing signals. Signals that may tell you what tomorrow's temperature will be or what your blood glucose levels are before bed. Alfred Whitehead and Clare Jeon explore methods for handling data gaps and when to consider which. Read more.

The hitchhiker’s guide to the cloud: Architecting for the cloud through customer stories

1:15pm–1:55pm Thursday, September 26, 2019

Session

Data Engineering and Architecture

Secondary topics: Cloud Platforms and SaaS, Data, Analytics, and AI Architecture

Sushant Rao (Cloudera)

Jason Wang and Sushant Rao offer an overview of cloud architecture, then go into detail on core cloud paradigms like compute (virtual machines), cloud storage, authentication and authorization, and encryption and security. They conclude by bringing these concepts together through customer stories to demonstrate how real-world companies have leveraged the cloud for their big data platforms. Read more.

Executive Briefing: Lessons from the front lines—Building a responsible AI/ML program in the enterprise

1:15pm–1:55pm Thursday, September 26, 2019

Session

Executive Briefing and best practices, Strata Business Summit

Secondary topics: Culture and Organization, Ethics

Keegan Hines (Capital One)

This talk will explore some of the philosophy around the concept of explaining a model given the colloquial definition is partially recursive. It will cover the lens banking regulation places on this philosophical basis and expand into techniques used for these well governed aspects. Read more.

2:05pm

Powering the future with data intelligence (sponsored by Collibra)

2:05pm–2:45pm Thursday, September 26, 2019

Session

Sponsored

Jim Cushman (Collibra), Piyush Jain (Progressive)

Transforming data into a trusted business asset that informs decision making requires giving teams access to a powerful platform that makes it easy to harness data across the enterprise. Jim Cushman and Piyush Jain detail how Progressive uses Collibra to transform the way data is managed and used across the organization, driving real business value. Read more.

Stream processing beyond streaming data

2:05pm–2:45pm Thursday, September 26, 2019

Session

Data Engineering and Architecture, Streaming and IoT

Secondary topics: Deep dive into specific tools, platforms, or frameworks, Streaming and IoT

Stephan Ewen (Ververica)

Stephan Ewen details how stream processing is becoming a "grand unifying paradigm" for data processing and the newest developments in Apache Flink to support this trend: new cross-batch-streaming machine learning algorithms, state-of-the-art batch performance, and new building blocks for data-driven applications and application consistency. Read more.

Getting clinical trial data ready for analysis: How IQVIA wrangled its way to success (sponsored by Trifacta)

2:05pm–2:45pm Thursday, September 26, 2019

Session

Sponsored

Matt Derda (Trifacta), Yogesh Prasad (IQVIA)

Clinical trial data analysis can be a complex process. The data is typically hand-coded and formatted differently and is required to be delivered in an FDA-approved format. Matt Derda and Yogesh Prasad explain how IQVIA built its Clean Patient Tracker and how it enabled agility and flexibility for end users of the platform, from data acquisition to reporting and analytics. Read more.

Posttransaction processing using Apache Pulsar at Narvar

2:05pm–2:45pm Thursday, September 26, 2019

Session

Data Engineering and Architecture, Streaming and IoT

Secondary topics: Data Integration and Data Processing, Data, Analytics, and AI Architecture, Retail and e-commerce, Streaming and IoT

Davor Bonaci (Kaskada), Anand Madhavan (Narvar)

Narvar provides next-generation posttransaction experience for over 500 retailers. Karthik Ramasamy and Anand Madhavan take you on the journey of how Narvar moved away from using a slew of technologies for their platform and consolidated its use cases using Apache Pulsar. Read more.

Automating ML model training and deployments via metadata-driven data, infrastructure, feature engineering, and model management

2:05pm–2:45pm Thursday, September 26, 2019

Session

Automation in data science and data, Data Science, Machine Learning, & AI

Secondary topics: Data quality, data governance and data lineage, Media and Advertising, Model Development, Governance, Operations

Mumin Ransom (Comcast), Nick Pinckernell (Comcast)

Mumin Ransom gives an overview of the data management and privacy challenges around automating ML model (re)deployments and stream-based inferencing at scale. Read more.

The new SDLC: CI/CD in the age of machine learning

2:05pm–2:45pm Thursday, September 26, 2019

Session

Automation in data science and data, Data Engineering and Architecture

Secondary topics: Model Development, Governance, Operations

Diego Oppenheimer (Algorithmia)

Machine learning (ML) will fundamentally change the way we build and maintain applications. Diego Oppenheimer dives into how you can adapt your infrastructure, operations, staffing, and training to meet the challenges of the new software development life cycle (SDLC) without throwing away everything that already works. Read more.

Creating an extensible 100+ PB real-time big data platform by unifying storage and serving

2:05pm–2:45pm Thursday, September 26, 2019

Session

Data Engineering and Architecture

Secondary topics: Data Integration and Data Processing, Data Management and Storage, Data, Analytics, and AI Architecture, Transportation and Logistics

Reza Shiftehfar (Uber)

Building a reliable big data platform is extremely challenging when it has to store and serve hundreds of petabytes of data in real time. Reza Shiftehfar reflects on the challenges faced and proposes architectural solutions to scale a big data platform to ingest, store, and serve 100+ PB of data with minute-level latency while efficiently utilizing the hardware and meeting security needs. Read more.

Securing your cloud data lake with a "defense in depth" approach

2:05pm–2:45pm Thursday, September 26, 2019

Session

Data Engineering and Architecture

Secondary topics: Cloud Platforms and SaaS, Privacy and Security

Tomer Shiran (Dremio), Jacques Nadeau (Dremio)

With cheap and scalable storage services such as S3 and ADLS, it's never been easier to dump data into a cloud data lake. But you still need to secure that data and be sure it doesn't leak. Tomer Shiran and Jacques Nadeau explore capabilities for securing a cloud data lake, including authentication, access control, encryption (in motion and at rest), and auditing, as well as network protections. Read more.

T-Mobile's journey to turn crowdsourced big data into actionable insights

2:05pm–2:45pm Thursday, September 26, 2019

Session

Case studies, Strata Business Summit

Secondary topics: BI, Interactive Analytics and Visualization, Telecom

Alex Yoon (T-Mobile)

T-Mobile successfully improved the quality of voice calling by analyzing crowdsourced big data from mobile devices. Alex Yoon walks you through how engineers from multiple backgrounds collaborated to achieve 10% improvement in voice quality and why the analysis of big data was the key to the success in bringing a better voice call service quality to millions of end users. Read more.

ThirdEye: LinkedIn’s business-wide monitoring platform

2:05pm–2:45pm Thursday, September 26, 2019

Session

Business Analytics and Visualization, Strata Business Summit

Secondary topics: BI, Interactive Analytics and Visualization, Media and Advertising, Temporal data and time-series analytics

Akshay Rai (Linkedin)

Failures or issues in a product or service can negatively affect the business. Detecting issues in advance and recovering from them is crucial to keeping the business alive. Join Akshay Rai to learn more about LinkedIn's next-generation open source monitoring platform, an integrated solution for real-time alerting and collaborative analysis. Read more.

Machine learning for streaming data: Practical insights

2:05pm–2:45pm Thursday, September 26, 2019

Session

Data Science, Machine Learning, & AI, Expo Hall

Secondary topics: Streaming and IoT, Telecom, Temporal data and time-series analytics

Heitor Murilo Gomes (Télécom ParisTech), Albert Bifet (Télécom ParisTech)

Heitor Murilo Gomes and Albert Bifet introduce you to a machine learning pipeline for streaming data using the streamDM framework. You'll also learn how to use streamDM for supervised and unsupervised learning tasks, see examples of online preprocessing methods, and discover how to expand the framework by adding new learning algorithms or preprocessing methods. Read more.

Learning asset naming patterns to find risky unmanaged devices

2:05pm–2:45pm Thursday, September 26, 2019

Session

Data Science, Machine Learning, & AI

Secondary topics: Deep Learning, Streaming and IoT

Ryan Foltz (Exabeam)

Unmanaged and foreign devices in the corporate networks pose a security risk, and the first step toward reducing this risk is the ability to identify them. Ryan Foltz walks you through a comprehensive device management machine learning model based on deep learning that performs anomaly detection based on only device names to flag devices that do not follow naming structures. Read more.

When Holt-Winters is better than machine learning

2:05pm–2:45pm Thursday, September 26, 2019

Session

Data Science, Machine Learning, & AI

Secondary topics: Temporal data and time-series analytics

Anais Dotis (InfluxData)

Machine learning (ML) gets a lot of hype, but its classical predecessors are still immensely powerful, especially in the time series space, and classical algorithms outperform machine learning methods in time series forecasting. Anais Dotis dives into how she used the Holt-Winters forecasting algorithm to predict water levels in a creek. Read more.

Fuzzy matching and deduplicating data: Techniques for advanced data prep

2:05pm–2:45pm Thursday, September 26, 2019

Session

Data Engineering and Architecture

Secondary topics: Data Integration and Data Processing, Data quality, data governance and data lineage

Nikki Rouda (Amazon Web Services), Janisha Anand (Amazon Web Services)

Nikki Rouda and Janisha Anand demonstrate how to deduplicate or link records in a dataset, even when the records don’t have a common unique identifier and no fields match exactly. You'll also learn how to link customer records across different databases, match external product lists against your own catalog, and solve tough challenges to prepare and cleanse data for analysis. Read more.

Executive Briefing: Unpacking AutoML

2:05pm–2:45pm Thursday, September 26, 2019

Session

Strata Business Summit

Paco Nathan (derwen.ai)

Paco Nathan outlines the history and landscape for vendors, open source projects, and research efforts related to AutoML. Starting from the perspective of an AI expert practitioner who speaks business fluently, Paco unpacks the ground truth of AutoML—translating from the hype into business concerns and practices in a vendor-neutral way. Read more.

2:45pm

2:45pm–3:45pm Thursday, September 26, 2019

Afternoon break sponsored by Io-Tahoe (1h)

3:45pm

SK Telecom's 5G network monitoring and 3D visualization on streaming technologies

3:45pm–4:25pm Thursday, September 26, 2019

Session

Data Engineering and Architecture, Streaming and IoT

Secondary topics: Data Integration and Data Processing, Data, Analytics, and AI Architecture, Streaming and IoT, Telecom

Jonghyok Lee (SK Telecom), Chon Yong Lee (SK Telecom)

Jonghyok Lee Chon Yong Lee discuss T-CORE, SK Telecom’s monitoring and service analytics platform, which collects system and application data from several thousand servers and applications and provides a 3D visualization of the real-time status of the whole network. Join in to hear lessons learned during development. Read more.

An introduction to machine learning on graphs

3:45pm–4:25pm Thursday, September 26, 2019

Session

Data Science, Machine Learning, & AI

Secondary topics: Financial Services

David Mack (Octavian)

Graphs are a powerful way to represent knowledge. Organizations, in fields such as biosciences and finance, are starting to amass large knowledge graphs, but they lack the machine learning tools to extract insights from them. David Mack offers an overview of what insights are possible and surveys the most popular approaches. Read more.

ML ops: Applying DevOps practices to machine learning workloads

3:45pm–4:25pm Thursday, September 26, 2019

Session

Automation in data science and data, Data Engineering and Architecture

Secondary topics: Cloud Platforms and SaaS, Deep dive into specific tools, platforms, or frameworks, Model Development, Governance, Operations

Sireesha Muppala (Amazon Web Services), Shelbee Eigenbrode (Amazon Web Services), Randall DeFauw (Amazon Web Services)

As an increasing level of automation becomes available to data science, the balance between automation and quality needs to be maintained. Applying DevOps practices to machine learning workloads brings models to the market faster and maintains the quality and integrity of those models. Sireesha Muppala, Shelbee Eigenbrode, and Randall DeFauw explore applying DevOps practices to ML workloads. Read more.

Enabling big data and AI workloads on the object store at DBS Bank

3:45pm–4:25pm Thursday, September 26, 2019

Session

Data Engineering and Architecture

Secondary topics: Cloud Platforms and SaaS, Data Management and Storage, Data, Analytics, and AI Architecture, Financial Services

Vitaliy Baklikov (DBS Bank), Dipti Borkar (Alluxio )

Vitaliy Baklikov and Dipti Borkar explore how DBS Bank built a modern big data analytics stack leveraging an object store even for data-intensive workloads like ATM forecasting and how it uses Alluxio to orchestrate data locality and data access for Spark workloads. Read more.

Protect your private data in your Hadoop clusters with ORC column encryption

3:45pm–4:25pm Thursday, September 26, 2019

Session

Data Engineering and Architecture, Security and Privacy

Secondary topics: Deep dive into specific tools, platforms, or frameworks, Privacy and Security

Owen O'Malley (Cloudera)

Fine-grained data protection at a column level in data lake environments has become a mandatory requirement to demonstrate compliance with multiple local and international regulations across many industries today. Owen O'Malley dives into how column encryption in ORC files enables both fine-grain protection and audits of who accessed the private data. Read more.

Migrating millions of users from voice- and email-based customer support to a chatbot

3:45pm–4:25pm Thursday, September 26, 2019

Session

Case studies, Strata Business Summit

Secondary topics: Text and Language processing and analysis, Transportation and Logistics

Madhu Gopinathan (MakeMyTrip), Sanjay Mohan (MakeMyTrip)

At MakeMyTrip customers were using voice or email to contact agents for postsale support. In order to improve the efficiency of agents and improve customer experience, MakeMyTrip developed a chatbot, Myra, using some of the latest advances in deep learning. Madhu Gopinathan and Sanjay Mohan explain the high-level architecture and the business impact Myra created. Read more.

Purposefully designing technology for civic engagement

3:45pm–4:25pm Thursday, September 26, 2019

Session

Law and Ethics, Strata Business Summit

Secondary topics: BI, Interactive Analytics and Visualization, Ethics

Audrey Lobo-Pulo (Phoensight), Annette Hester (National Energy Board, Canada)

As new digital platforms emerge and governments look at new ways to engage with citizens, there's an increasing awareness of the role these platforms play in shaping public participation and democracy. Audrey Lobo-Pulo, Annette Hester, and Ryan Hum examine the design attributes of civic engagement technologies and their ensuing impacts and an NEB Canada case study. Read more.

Deep learning on Apache Spark at CERN’s Large Hadron Collider with Analytics Zoo

3:45pm–4:25pm Thursday, September 26, 2019

Session

Data Science, Machine Learning, & AI

Secondary topics: Deep Learning

Sajan Govindan (Intel)

Sajan Govindan outlines CERN’s research on deep learning in high energy physics experiments as an alternative to customized rule-based methods with an example of topology classification to improve real-time event selection at the Large Hadron Collider. CERN uses deep learning pipelines on Apache Spark using BigDL and Analytics Zoo open source software on Intel Xeon-based clusters. Read more.

Soss: Lightweight probabilistic programming in Julia

3:45pm–4:25pm Thursday, September 26, 2019

Session

Data Science, Machine Learning, & AI

Secondary topics: Deep dive into specific tools, platforms, or frameworks

Chad Scherrer (Metis)

Chad Scherrer explores the basic ideas in Soss, a new probabilistic programming library for Julia. Soss allows a high-level representation of the kinds of models often written in PyMC3 or Stan, and offers a way to programmatically specify and apply model transformations like approximations or reparameterizations. Read more.

Lessons learned from scaling the tech stack of a modern analytics platform

3:45pm–4:25pm Thursday, September 26, 2019

Session

Data Engineering and Architecture

Secondary topics: Data, Analytics, and AI Architecture

Scott Castle (Sisense)

In this session, Scott Castle, General Manager at Sisense and former VP of Product at Periscope Data, will discuss lessons learned from scaling up Periscope Data to support incredibly large volumes of data and queries from its data teams. Read more.

Executive Briefing: Building a culture of self-service from predeployment to continued engagement

3:45pm–4:25pm Thursday, September 26, 2019

Session

Culture and organization, Strata Business Summit

Secondary topics: Culture and Organization, Transportation and Logistics

Jonathan Tudor (GE Aviation), Ross Schalmo (GE Aviation)

Jonathan Tudor and Ross Schalmo explore how GE Aviation made it a mission to implement self-service data. To ensure success beyond initial implementation of tools, the data engineering and analytics teams created initiatives to foster engagement from an ongoing partnership with each part of the business to the gamification of tagging data in a data catalog to forming a published dataset council. Read more.

4:35pm

Bridging the gap between big data computing and high-performance computing

4:35pm–5:15pm Thursday, September 26, 2019

Session

Data Engineering and Architecture

Secondary topics: Data, Analytics, and AI Architecture

Supun Kamburugamuve (Indiana University)

Big data computing and high-performance computing (HPC) evolved over the years as separate paradigms. With the explosion of the data and the demand for machine learning algorithms, these two paradigms increasingly embrace each other for data management and algorithms. Supun Kamburugamuve explores the possibilities and tools available for getting the best of HPC and big data. Read more.

Using Spark to speed up the diagnosis performance for big data applications

4:35pm–5:15pm Thursday, September 26, 2019

Session

Data Engineering and Architecture

Ruixin Xu (Microsoft), Long Tian (Microsoft), Yu Zhou (Microsoft)

Ruixin Xu, Long Tian, and Yu Zhou explore an experiment run using Spark and Jupyter notebooks as a replacement for existing IDE-based tools for internal DevOps. The Spark-based solution improved the diagnosis performance significantly, especially for a complex job with a large profile, and leveraging the Jupyter notebooks brings the benefit of fast iteration and easy knowledge share. Read more.

Combining creativity and analytics

4:35pm–5:15pm Thursday, September 26, 2019

Session

Strata Business Summit

David Boyle (Audience Strategies)

Companies that harness creativity and data in tandem have growth rates twice as high as companies that don’t. David Boyle shares lessons from his successes and failures in trying to do just that across presidential politics, with pop stars, and with power brands in the world of luxury goods. Join in to find out how analysts can work differently to build these partnerships and unlock this growth. Read more.

Executive Briefing: Big data in the era of heavy worldwide privacy regulations

4:35pm–5:15pm Thursday, September 26, 2019

Session

Executive Briefing and best practices, Strata Business Summit

Secondary topics: Privacy and Security

Mark Donsky (Okera)

California is following the EU's GDPR with the California Consumer Protection Act (CCPA) in 2020. Penalties for non-compliance, but many companies aren't prepared for this strict regulation. This session will explore the capabilities your data environment needs in order to simplify CCPA and GDPR compliance, as well as other regulations. Read more.

Deep learning technologies for giant hogweed eradication

4:35pm–5:15pm Thursday, September 26, 2019

Session

Data Science, Machine Learning, & AI

Secondary topics: Deep Learning

Naoto Umemori (NTT DATA), Masaru Dobashi (NTT DATA)

Giant hogweed is a highly toxic plant. Naoto Umemori and Masaru Dobashi aim to automate the process of detecting the plant with technologies like drones and image recognition and detection using machine learning. You'll see how they designed the architecture, took advantage of big data and machine and deep learning technologies (e.g., Hadoop, Spark, and TensorFlow), and the lessons they learned. Read more.

Scalable anomaly detection with Spark and SOS

4:35pm–5:15pm Thursday, September 26, 2019

Session

Data Science, Machine Learning, & AI

Secondary topics: Temporal data and time-series analytics

Jeroen Janssens (Data Science Workshops)

Jeroen Janssens dives into stochastic outlier section (SOS), an unsupervised algorithm for detecting anomalies in large, high-dimensional data. SOS has been implemented in Python, R, and, most recently, Spark. He illustrates the idea and intuition behind SOS, demonstrates the implementation of SOS on top of Spark, and applies SOS to a real-world use case. Read more.

Spark on Kubernetes for data science

4:35pm–5:15pm Thursday, September 26, 2019

Session

Data Science, Machine Learning, & AI

Jordan Volz (Dataiku)

Spark on Kubernetes is a winning combination for data science that stitches together a flexible platform harnessing the best of both worlds. Jordan Volz gives a brief overview of Spark and Kubernetes, the Spark on Kubernetes project, why it’s an ideal fit for data scientists who may have been dissatisfied with other iterations of Spark in the past, and some applications. Read more.

Executive Briefing: What it takes to use machine learning in fast data pipelines

4:35pm–5:15pm Thursday, September 26, 2019

Session

Executive Briefing and best practices, Strata Business Summit

Secondary topics: Data, Analytics, and AI Architecture, Streaming and IoT

Dean Wampler (Anyscale)

Dean Wampler dives into how (and why) to integrate ML into production streaming data pipelines and to serve results quickly; how to bridge data science and production environments with different tools, techniques, and requirements; how to build reliable and scalable long-running services; and how to update ML models without downtime. Read more.